MaxDiff (Best-Worst Scaling)

MaxDiff (Best-Worst Scaling) is the method of choice when you need to rank a long list of items by relative importance — features, claims, policy options, values, attributes — and a plain rating scale flattens everything into "all important". Instead of rating each item in isolation, respondents repeatedly see a small subset and pick the best and the worst in each. Across many such cards every item is shown several times in different combinations, producing a far more discriminating, trade-off-based ranking than a Likert scale or a single best/worst question.

tickStat is a survey and experiment platform built by academic researchers, for academic researchers, and the full Best-Worst Scaling workflow lives on a single platform: defining the item list, generating a frequency-balanced experimental design, the mobile-friendly best/worst interaction, and an integrated MaxDiff Analysis that turns the raw choices into utilities and preference shares.

Why tickStat for MaxDiff¶

Frequency-balanced designs, generated for you. Set items per card and number of cards, and tickStat builds a design where each item appears about the same number of times and item pairs are spread evenly. Prefer full control? Paste your own design, one card per line — it is validated against your item list.
A clean, mobile-first respondent experience. A single paginated screen walks through the cards one at a time with a progress bar; on each card the respondent picks one best and one worst item. The cards and item order are stable across page reloads, so refreshing never loses answers — important for fieldwork on phones outside the lab.
Balanced exposure across respondents. Present every generated card or only a subset per respondent; with uniform rotation the subset window rotates across respondents so every card gets balanced exposure. Item order within each card can be randomised to remove position effects.
Integrated MaxDiff Analysis — no export required. A model-free Best-Worst counts table (times shown, best, worst, B−W score, rank) and a horizontal bar chart, plus an aggregate best-worst conditional logit estimated in R with relative utilities, standard errors, z, p-values and significance stars.
Preference shares, ready to communicate. Utilities are rescaled into preference shares that sum to 100% — "item A would be picked best ~40% of the time, item F only ~5%" — the standard, publication-ready summary of MaxDiff results.
Valid completes only, and raw data on demand. The analysis includes only respondents who finished the survey (status Done); screen-outs and in-progress sessions are excluded. The Complete Report export adds a dedicated MaxDiff Results sheet with one row per respondent and card for full transparency.
Multilingual fielding in 14 languages. Run the same MaxDiff study across countries with a single survey definition.
GDPR and EU AI Act compliant. Important for academic researchers running publication-grade studies under the current regulatory regime.

What's on this page¶

Below: how to define the item list, configure the MaxDiff design (items per card, number of cards, balanced or uploaded design), set the presentation options, run the best/worst interaction, and read the integrated MaxDiff Analysis — Best-Worst counts, the aggregate best-worst logit and preference shares.

If you are new to the platform, start with the Getting started guide.

MaxDiff (also called Best-Worst Scaling, Sawtooth-style) measures the relative importance of a list of items by repeatedly showing respondents a small subset of the items and asking them to pick the best and the worst one in each subset. Across many such tasks, every item is shown several times in different combinations, which yields a far more discriminating ranking than a single rating scale or a one-screen best/worst question.

To create one, go to Questions → New Question and select MaxDiff (Best-Worst Scaling) from the question type list.

How it works¶

A MaxDiff question follows the same mother → child-tasks pattern as the Discrete Choice Experiment:

You define the full item list (the things to be evaluated) as the question's answers, exactly like any other question — each item has a title, an optional description, and an optional Condition to visualize (a logical condition, true by default, that controls whether the item is shown — the same per-answer condition mechanism used in other question types).
You configure the design (see below) and save. tickStat then generates the tasks (cards) as child questions, one per task. Both the mother question and its generated tasks appear in the question list — the tasks are shown grouped and color-coded under the mother and cannot be reordered individually.
During the survey the respondent sees a single screen with a paginated widget that walks through the tasks one card at a time. On each card the respondent selects one best and one worst item (the same item cannot be both). A progress bar shows how many cards remain. When every presented card is answered, the survey advances.

Configuration¶

Best/Worst labels and colors - Best label / Best color and Worst label / Worst color — the text and accent color used for the two selection columns on each card.

MaxDiff design - Items per card — how many items appear on each card (default 4; typically 4–5). - Number of cards to generate — how many cards the design contains (default 10). Rule of thumb: enough cards that each item is seen about three times → cards ≈ 3 × (number of items) ÷ items per card (for example, 12 items with 4 per card → ~9 cards). - Design source: - Let tickStat generate a balanced design — tickStat builds a frequency-balanced design so each item appears roughly the same number of times and item pairs are spread evenly across the cards. - Upload my own design — paste your own design, one card per line with item ids separated by commas (for example R1,R3,R5,R8). Each line must contain exactly items per card distinct, existing item ids. - Generate / regenerate cards on save — check this box to (re)build the child cards when you save. Leave it unchecked to edit the configuration or item labels without rebuilding the design. Regenerating preserves card ids where possible so existing display logic is not invalidated.

MaxDiff presentation - Number of cards to present per respondent — set to 0 (the default) to present all generated cards, or a smaller number to show only a subset to each respondent. When a subset is shown, Uniform rotates the window across respondents so every card gets balanced exposure. - Random presentation order — off = Uniform (balanced rotation across respondents); on = Random order/selection per respondent. - Randomize item order within each card — shuffles the position of the items inside each card per respondent.

The selection and order a respondent sees are stable across page reloads, so refreshing the page does not change the cards or lose answers already given.

Results¶

There are two places to see MaxDiff results: the raw captured answers in the Complete Report, and the statistical MaxDiff Analysis screen.

MaxDiff Results sheet (Complete Report)¶

The Complete Report (SPSS Export) includes a dedicated MaxDiff Results sheet whenever the survey contains MaxDiff questions. It lists one row per respondent and per answered card, showing the items presented on that card and the best and worst item the respondent chose. It includes respondents of every state and is meant for checking the captured raw data.

MaxDiff Analysis screen¶

Open it from Analysis → MaxDiff Analysis. The line is always listed in the Analysis section (so the method is discoverable); if the survey has no MaxDiff question the screen simply says so. Its purpose is to turn the raw best/worst choices into a ranking of the items and a statistical model of their relative preference.

Who is included: only respondents who completed the survey (status Done) enter the analysis. Early Screen Out, Low Quality, Quota Complete and in-progress sessions are excluded, so the numbers reflect only valid completes.

Choosing the question and running it: at the top there is a MaxDiff question selector and a Run analysis button. When you open the screen — or switch questions in the selector — it shows the last saved analysis for that question, if any, with a Last run timestamp; it does not recompute automatically. Click Run analysis to (re)compute with the current responses; the result is saved to disk and shown until the next run. If the question has never been analysed, an empty state invites you to run it. (This is the same "show the last run, re-run on demand" pattern used by Choice Analysis.)

The screen has two parts: a Best-Worst counts analysis (always available, computed directly from the answers) and an aggregate best-worst logit model (estimated in R). The logit pools each card's best choice and its worst choice (the worst modelled as the lowest-utility pick) into a single conditional logit; one item is fixed as the reference, and every other item's utility is expressed relative to it.

Summary cards (top of the report): - Respondents: number of Done respondents included in the analysis. - Items: number of items in the MaxDiff list. - Log-likelihood: log-likelihood of the fitted logit (closer to 0 is better). - Pseudo R²: McFadden pseudo-R² of the logit (higher = better fit; values around 0.1–0.3 are typical and good). - AIC / BIC: information criteria for model fit (lower is better; useful to compare models). - Best/worst choices: number of best/worst choice observations the model was estimated on. Each answered card contributes two (one best pick and one worst pick), so this is roughly respondents × cards × 2, minus any unanswered cards. It is not the number of cards.

The fit cards (Log-likelihood, Pseudo R², AIC, BIC, Best/worst choices) appear only when the logit ran.

Best-Worst counts table — one row per item: - Item: the item being evaluated. - Shown: number of cards (across all included respondents) on which the item appeared. - Best: number of times the item was chosen as the best. - Worst: number of times the item was chosen as the worst. - B-W: Best-Worst score = Best − Worst. Higher means more preferred. - B-W / shown: the B-W score divided by Shown. Ranges from −1 to +1 and is comparable across items regardless of how often each one appeared (+1 = chosen best every time it was shown; 0 = chosen best as often as worst; −1 = chosen worst every time). - Rank: position when ordering by B-W score (1 = most preferred).

Bar chart: below the counts table, a horizontal bar chart shows the B-W score per item, ordered from most to least preferred. Hovering the chart reveals two buttons in the top-right corner — download (saves the chart as a PNG) and copy (copies the chart image to the clipboard).

Aggregate best-worst logit table — one row per item, from the conditional logit: - Item: the item. - Utility: estimated relative preference. One item is the reference (utility 0) and the rest are expressed relative to it; higher = more preferred. Utilities are on the logit scale, so only differences between items are meaningful (not the absolute value). - Std. error: standard error of the utility estimate (precision; smaller = more precise). - z: z-statistic (utility ÷ std. error). - p-value: statistical significance of the estimate, shown with stars ( p<0.01, ** p<0.05, * p<0.1). A significant value means the item's preference differs reliably from the reference item. - *Share %: rescaled preference share (softmax of the utilities), which sums to 100% across items. It is the probability the item would be chosen as best if all items were shown together — the standard interpretable summary of MaxDiff utilities. It follows the same ranking as Utility and as the counts.

How to read it: the counts give a quick, model-free ranking; the logit confirms it statistically and the Share % turns it into easy-to-communicate preference weights (e.g. "item A would be picked best ~40% of the time, item F only ~5%"). The counts ranking and the logit/Share ranking should agree; large disagreements usually mean too few respondents or a near-tie between items.

States and troubleshooting: - If the survey has no MaxDiff question, the analysis line does not appear. - If no Done respondents have answered yet, the tables are empty. - If the R engine is unavailable or the model fails to converge, the counts table and bar chart are still shown and the logit section reports that it could not be computed — the counts analysis never depends on R.