v0 design preview · working title

Reflexive TA, with provenance.

An LLM-driven toolkit for qualitative researchers, built around Braun & Clarke's six phases.

This site is a design walkthrough of the v0 walking skeleton — twelve pipeline nodes, ten locked decisions, five deep dives — produced so a less-technical colleague can read the plan and stress-test it.

Funded by MASSHINE. All design choices in the v0 spec are auditable in natural language: every code, merge, and theme carries a written rationale and a verbatim span back to the source transcript.

Start reading See the pipeline

The method

Six phases, translated.

Braun & Clarke's reflexive thematic analysis treats researcher subjectivity as a generative resource. MASSHINE maps the six phases onto auditable model calls — keeping the human in the loop where interpretation is non-substitutable. Where a phase is deliberately not automated in v0, the right column says so: the v0/post-v0 boundary is itself a design decision (spec §8/§9).

Braun & Clarke

Reflexive TA · 2006 / 2019 / 2021

FamiliarizationImmersion in the data; building interpretive sensibility.
Generating initial codesTagging segments with semantic and (where warranted) latent readings.
Searching for themesConstructing patterns of shared meaning, not clustering codes.
Reviewing themesTesting each candidate against the data and against sibling themes.
Defining & namingWriting a central-concept statement; naming it; scoping its boundary.
Producing the reportNarrative + illustrative quotes, written in the researcher's voice.

MASSHINE v0

LLM calls · file artifacts · human checkpoints

StructureDocPer-transcript section boundaries + one-line gists. Familiarization scaffold for the human.
Code (N personas)2–3 prompt variants code each excerpt with CoT rationale and verbatim span.
Theorist (single pass)Writes a central-concept statement per candidate theme — a claim, not a label.
Checkpoint: themesHuman review splits, merges, discards. A separate critic agent is deliberately post-v0 (spec §8).
Central-concept statementv0 stops at the argued CandidateTheme; per-theme definition objects with quote banks are the Stage-2 extension.
report.md + auditv0 emits a gate-table report and the full provenance trail. Narrative drafting is post-v0; the human writes the interpretation either way.

Semantic codes

Surface content. "She describes the house as having the barn attached." Handled well by LLMs; verified by verbatim-span match.

Latent codes

Underlying meaning. "The barn being part of the house is read as continuity of agrarian life." LLMs are weaker here; flagged for human review.

Reflexive memos

Drift notes from the reconcile step. Memos are prompts for human reflexivity, not substitutes for it.

The pipeline

One paragraph, seven stages.

Walk a single sentence from Mary Grande's 1989 interview through the pipeline. At each stage, the data looks different — and the changes are auditable.

Below, the paragraph "I had to help with, when they killed the pigs I had to catch the blood. I didn't like it, but that was part of my job." travels from raw text to candidate theme. Read each card's summary, then the data, then the meta line. Stages marked editable are the ones a human reviews at the checkpoint. The full twelve-node pipeline — including reconcile, both checkpoints, the cross-family judge, and the audit — is mapped in Architecture & scale.

Locked decisions

Ten choices, and the evidence for each.

Each v0 locked decision, with the spec clause, the research evidence, and a moment from a real Ellis Island transcript that shows why it matters.

Architecture & scale

Twelve nodes now, three stages later.

The full v0 pipeline with inputs and outputs per node, the triggers that pull in each post-v0 extension, and the staged path from 12 synthetic transcripts to the 1,343-transcript corpus. Where the spec deviates from the literature's "proven recipe" (embeddings, SQLite), the trigger to revisit is written down here.

Extension map — trigger → addition (spec §9)

Scale story — from walking skeleton to full corpus

Deep dives

Five artifacts, five uses.

A pressure test, a systems comparison, and three worked transcript runs — each exportable as markdown for design conversations with colleagues and reviewers.

How these examples were made: the runs below are hand-authored illustrations of the v0 artifact formats, drawn from real Ellis Island Oral History Project transcripts (every quote is string-verified against the source — see tools/verify_quotes.py). v0 itself runs on synthetic data only (spec §3); the Ellis Island corpus enters at Stage 2.

Audit & limits

Four gates, and what we won't do.

MASSHINE's exit gates are the only success criteria. The limits panel is the honest part — what the system deliberately does not automate.

Theme recovery

≥ 4/6 planted themes recovered, including ≥ 1 of the 2 latent themes. Measured on the synthetic corpus only and always labeled synthetic_only.

Verbatim grounding

Zero quotes in the final artifacts fail verbatim verification. The audit re-string-matches every quote against its source transcript.

Full traceability

Every theme resolves to codes → excerpts → char spans. No orphans. If the trace breaks, the theme is not grounded.

Human-legible trail

A full run is readable end-to-end: every artifact and decision is JSON or markdown, diffable with git, auditable in under an hour.

no κ

No inter-rater reliability

Reflexive TA treats subjectivity as a generative resource, not as bias. IRR presupposes a correct coding; RTA rejects that premise. Reserve IRR for codebook-TA arms.

no auto-reflexivity

No autonomous reflexivity

The reconcile step writes drift memos as prompts for human reflexivity, not as a substitute. Every system we surveyed keeps a human as final interpretive authority.

Reading list

Twelve papers, one paragraph each.

The research foundation for v0. The full literature review lives in research.md; these are the ones that actually moved design choices.