What is a stage-gated qualitative analysis pipeline?

A stage-gated qualitative analysis pipeline is a sequence of analysis stages where each stage produces a versioned artefact and requires explicit reviewer lock before the next stage runs. The gates exist so a future researcher (or an audit) can trace any claim back to the artefact and the inputs that produced it.

What is tier and dual-axis scoring?

Tier classifies a finding as dominant, recurring, signal, or outlier. Prevalence (1–5) measures how widely a pattern shows up across participants. Intensity (1–5) measures how strongly any one participant expresses it. The three are independent and never collapsed into a single score, because a dominant-prevalence-low-intensity finding is a different finding from a low-prevalence-high-intensity one.

Why is counter-evidence rendered by default?

Every finding card surfaces both supporting and disconfirming quotes. The discipline forces the analysis to test against the contrary view rather than retrofit confidence. A finding with zero counter quotes after a dedicated counter-example-search pass is treated as 'tested against the contrary view; none surfaced' — itself a defensible outcome.

What is full back-traceability?

Every claim in every output links to a finding, the supporting and counter quotes, the transcript line each quote spans, the participant who said it, the conducted-at date, the video timestamp where applicable, and the stage_run that produced it. The chain is structural — a foreign key on every model-derived row — not aspirational.

How are language quantifiers like 'most' and 'a few' chosen?

Quantifier words are mapped from prevalence counts via a fixed table, not chosen for effect. Five-of-five participants is 'every participant'; four-of-five is 'most'; three-of-five is 'several'; two-of-five is 'a few'; one is 'one participant'. Researcher overrides are recorded with a reason so the audit trail explains why the language differs from the count.

Methodology

A defensible methodology for qualitative research analysis

Published 2026-05-02 · ~12 minute read

Senior UX researchers do not need another tag-and-quote tool. They need a tool that refuses to ship outputs that violate the craft. This essay describes the seven-stage pipeline behind insightful.cx, why each gate exists, and the four disciplines — tier and dual-axis scoring, governed language quantifiers, counter-evidence by default, and full back-traceability — that the pipeline enforces.

The methodology is opinionated. That is the point. A flexible tool will let you skip a stage, collapse a tier into a thumb-up, ship a finding with no disconfirming voices on the slide, and choose “most participants” for a quote count of three because it reads better. A methodology-first tool refuses on your behalf.

The seven stages

The pipeline is fixed. Each stage produces a versioned artefact. Each artefact requires explicit researcher review and lock before the next stage runs. There is no “skip” button and no auto-publish path.

1. Ingestion

Ingestion is the typed import of transcripts and metadata into the project. It is not a free-text upload. The stage validates the transcript schema, binds each interview to a participant row, records the conducted-at date and the cohort, and persists the source file alongside the parsed lines. The artefact is a set of Interview and TranscriptLine rows with full character offsets preserved.

The gate exists because every later stage references the transcript by participant ID and offset. If ingestion is sloppy, every quote citation downstream is wrong. The reviewer locks the participant roster — name, cohort, consent posture — before speaker attribution begins.

2. Speaker attribution

Speaker attribution binds each utterance to a participant. The tool proposes a mapping from raw speaker labels (often “Speaker 1”, “Speaker 2”) to the participant roster locked in stage one. The artefact is a per- transcript mapping table.

The gate exists because misattribution is silent and catastrophic. A single swapped speaker can flip the polarity of an entire pattern. The reviewer scrubs the mapping — not the transcript — and locks it. Once locked, the mapping cannot be changed without invalidating downstream stages.

3. Per-transcript first-pass coding

First-pass coding is per-transcript by design. The tool proposes codes against each interview individually, with no awareness of the other interviews in the corpus. The artefact is a set of Code rows attached to Quote rows that span specific transcript lines.

The gate exists because cross-transcript coding flattens. If the model has already seen ten interviews when it codes the eleventh, it will code the eleventh in the language of the first ten — and the unique signal in the eleventh transcript will be lost. Per-transcript first-pass coding preserves the texture that pattern detection then operates on.

The reviewer reviews codes per transcript: rename, merge, split, delete. The transcript is not locked; the coding pass is.

4. Cross-transcript pattern detection

Pattern detection clusters codes into themes across the corpus. The artefact is a set of Theme rows referencing the codes that cluster into them, with a prevalence count attached. This is the first stage that touches the corpus as a whole.

The gate exists because a theme is a structural claim — “a recurring pattern across N participants” — and the definition of N must be auditable. The reviewer accepts, splits, or rejects clusters. A dedicated counter-example-search pass runs against each accepted theme, scanning the corpus for quotes that contradict the theme even if the synthesis missed them.

5. Per-RQ synthesis

Synthesis is per-research-question, not per-theme. For each research question on the project, the tool drafts a finding: the answer, the tier, the prevalence and intensity scores, the supporting quotes, the counter quotes, and the caveats. The artefact is a Finding row with foreign keys to every quote it cites and the stage_run that produced it.

The gate exists because the research question is the unit of accountability. The deliverable is not a theme map; it is an answer to each question the project was commissioned to address. A research question that the corpus cannot answer is flagged as under-evidenced rather than fudged.

6. Narrative drafting

Narrative drafting renders the findings into a draft report and deck. The artefacts are a .docx and a .pptx file. The text is generated from the locked findings, with quantifier language mapped from the evidence (see below) and counter- evidence rendered on every finding card and slide.

The gate exists because language matters. The reviewer reads the draft, edits in place, and locks the narrative. Override decisions — for example, picking “several” over “a few” for a count of two — are recorded with a reason so the audit trail explains why the language differs from the count.

7. Highlight-reel curation

Reel curation produces a .mp4 of verbatim moments. The tool proposes clips per finding using the locked supporting and counter quotes; the reviewer picks, orders, and trims. The reel renders last, with FFmpeg, against the source video files on the operator's machine.

The gate exists because the reel is the most rhetorically powerful artefact and the most easily abused. A reel of only supporting clips is a propaganda piece. The tool will not allow that posture without a recorded override.

Why the gates cannot be skipped

A stage gate is a contract between two stages: the upstream stage promises a versioned, locked artefact; the downstream stage promises to operate only on that artefact. Skipping a gate breaks the contract.

The structural argument is simpler than the methodological one. Every artefact carries a version. Every model-derived row in every later stage carries a foreign key to the stage_run that produced it. If a stage is skipped, the chain is broken. Six months later, when a stakeholder asks “why did the deck say this?”, the answer is “we cannot reconstruct it.” That answer is unacceptable.

The methodological argument is that each gate is the only place where a specific class of error is cheap to fix. A misattributed speaker is cheap to fix at the attribution gate and ruinous to fix after coding. A missing counter-quote is cheap to add at synthesis and embarrassing to add after the deck has shipped. The gates are positioned where the work is.

The tool refuses to ship outputs that bypass a gate. That is not a feature flag. It is the spine.

Tier and dual-axis scoring, never collapsed

Tier is a categorical classification of a finding's standing in the corpus. Tier is not a confidence score and not a star rating. The four tiers are:

Dominant — the pattern shows up across most participants and the evidence is unambiguous.
Recurring — the pattern shows up across several participants with consistent shape.
Signal — the pattern shows up in a small number of participants but is sharp enough to warrant attention.
Outlier — a single participant says something striking that does not generalise but is worth naming.

Prevalence is a 1–5 score for how widely the pattern shows up across participants. Intensity is a 1–5 score for how strongly any one participant expresses it. The two axes are independent and report different things.

A dominant-prevalence-low-intensity finding (“most members mentioned the rate-rise email; nobody felt strongly about it”) is a different finding from a low-prevalence- high-intensity finding (“two members described the rate- rise email as a betrayal”). One number cannot tell those apart. Four numbers — tier, prevalence, intensity, and a participant count — can.

The tool refuses to render a single composite score. A stakeholder who insists on one is asking for the wrong artefact. The data is on the page; the integration is the stakeholder's responsibility, not the tool's.

How language quantifiers map from evidence

Quantifier words — “every”, “most”, “several”, “a few”, “one participant” — are governed, not stylistic. The mapping is a fixed table, applied at narrative drafting:

5 of 5 participants → “every participant”
4 of 5 → “most participants”
3 of 5 → “several participants”
2 of 5 → “a few participants”
1 of 5 → “one participant”

The cohort size is the denominator. For a 12-participant cohort, the bands scale: “most” means 8 or more, “several” means 4–7, and so on. The mapping is project-config not free-text.

Researchers can override. A worked override looks like this. The count is two; the default mapping is “a few”; the researcher writes “a small but notable subgroup” and records the reason: “both participants are in the high-spender cohort and the pattern is concentrated.” The override is on the audit trail. A future reader sees both the count and the reason and can decide whether the framing was fair.

This is not a style guide. A style guide is advisory. The mapping is enforced — the default is generated from the evidence, and any divergence carries a reason on disk.

Counter-evidence is rendered by default

Disconfirming voices are surfaced alongside confirming ones. Every finding card shows both. Every slide that carries a finding shows both. Every section in the report that asserts a claim shows both. There is no “hide counter-evidence for the executive summary” setting.

Two passes do the work. The synthesis pass asks the model for both supporting and disconfirming quotes against the draft answer. The dedicated counter-example-search pass scans the full corpus for quotes that contradict the finding even if the synthesis missed them — explicitly looking for what the synthesis would prefer to forget.

When the search returns nothing, the finding card still renders an empty counter-evidence block, labelled “tested against the contrary view; none surfaced in this dataset.” The empty state is itself a defensible outcome and is explicitly named on the page rather than silently omitted.

The deeper essay on this principle is at /why-counter-evidence.

Back-traceability

Every claim in every output links to a chain: finding → quote → transcript line → participant → conducted-at → video timestamp → stage_run that produced it. The chain is structural, not aspirational. A foreign key on every model-derived row enforces it at the database level.

In the UI, every finding card has a “why did the tool say that?” expander. The expander shows the prompt sent to the model, the model used, the token counts, the raw output, the stage version, and the timestamp. The trace stops only at “the model said this” — and at that boundary, the audit trail records what the model was given and what it returned, verbatim.

The deeper essay on this principle is at /back-traceability.

Under-evidenced research questions

“We cannot answer this with this dataset” is a valid finding. The pattern is to flag, not finesse.

An under-evidenced research question carries no tier and no dual-axis score. It carries a flag, a one-paragraph explanation of what is missing, and a recommendation about what would close the gap — usually more interviews of a specific shape, occasionally a different recruitment screen. The deck and the report render this finding the same way they render any other: explicitly, with the flag visible.

This is hard. The social pressure inside a research-to- deliverable workflow rewards a clean answer; an under-evidenced flag reads as a failure to deliver. The discipline is to insist that an honest flag is a stronger deliverable than a fabricated answer. A senior researcher will agree; a stakeholder may push back. The tool sides with the researcher.

What this methodology refuses to do

The list of refusals is the product, as much as the list of features:

Auto-publish. There is no path from a draft finding to a shipped artefact that bypasses the reviewer.
Auto-skip. There is no path that bypasses a stage gate.
Single-score collapse. Tier, prevalence, and intensity are independent and never combined.
Hidden disconfirming voices. Counter-evidence renders by default on every surface.
Stylistic quantifier choice. Language quantifiers map from the evidence; overrides carry a reason.
Untraceable claims. A claim with no chain to a quote, a participant, a transcript offset, and a stage_run cannot be rendered.

These refusals are the load-bearing edges of the product. They are not configurable. A team that needs them configurable is a team that needs a different tool — which is fine, and /vs-general-purpose-tools describes that trade-off honestly.

Why we render counter-evidence by default — the worked example, the empty-state semantics, the two passes that do the work.