--- /dev/null
+# How classgraph gets its data
+
+This document explains where every piece of information in the viewer
+comes from, how type-family resolution works, and what the user can and
+cannot trust about the rendered chains. It's targeted at someone who
+wants to know whether a given edge in the graph is "real" GHC information
+or an inferred (possibly fragile) reconstruction.
+
+## Table of contents
+
+- [Pipeline at a glance](#pipeline-at-a-glance)
+- [Where extraction runs](#where-extraction-runs)
+- [Classes](#classes)
+- [Class instances](#class-instances)
+- [Type and data families](#type-and-data-families)
+- [Type family instances](#type-family-instances)
+- [Type-family resolution in the viewer](#type-family-resolution-in-the-viewer)
+- [Are the resolution chains deterministic?](#are-the-resolution-chains-deterministic)
+- [What the viewer is *not* doing](#what-the-viewer-is-not-doing)
+
+## Pipeline at a glance
+
+```
+ ┌────────────────────┐ typecheck ┌─────────────────────┐
+ │ user's source code │ ──────────────► │ TcGblEnv (in GHC) │
+ └────────────────────┘ │ tcg_tcs │
+ │ tcg_insts │
+ │ tcg_fam_insts │
+ └──────────┬──────────┘
+ │ typeCheckResultAction
+ ┌──────────▼──────────┐
+ │ Classgraph.Extract │
+ │ (this repo's lib) │
+ └──────────┬──────────┘
+ │ JSON
+ ┌──────────▼──────────┐
+ │ classgraph-view │
+ │ merge + render │
+ └─────────────────────┘
+```
+
+All extraction happens **after typechecking** but **before desugaring to
+Core**. By that point GHC has already:
+
+- Resolved all class superclass theta to fully simplified `PredType`s.
+- Fully kind-elaborated every `Type`.
+- Computed the closed normal form of class instance heads.
+- Fully populated `tcg_insts` (the local instances) and
+ `tcg_fam_insts` (the local family instances).
+
+That's important: every fact we record is *what GHC itself sees and
+uses* once typechecking is done. We don't run our own typechecker, our
+own constraint solver, or our own type-family reducer over the source
+code. We just *transcribe* GHC's `TcGblEnv` into JSON.
+
+The single exception — **class-instance resolution chains via type
+families** — happens in the viewer (JavaScript), purely against the
+already-extracted JSON. That logic *is* a reconstruction, and is
+called out explicitly in the
+["Type family resolution in the viewer"](#type-family-resolution-in-the-viewer)
+section below.
+
+## Where extraction runs
+
+The plugin registers a single hook:
+
+```haskell
+plugin = defaultPlugin
+ { typeCheckResultAction = \_args _summ tcg_env -> do
+ liftIO (writeJsonForModule tcg_env)
+ return tcg_env
+ }
+```
+
+`typeCheckResultAction :: [CommandLineOption] -> ModSummary -> TcGblEnv -> TcM TcGblEnv`
+runs once per module compiled with the plugin, after the typechecker
+has finished but *before* GHC desugars to Core. We never modify
+`tcg_env`; the plugin is observation-only.
+
+This is the latest hook at which all the extracted data is reliably
+available without us having to do interface-file fishing. In
+particular, `tcg_insts` and `tcg_fam_insts` are already populated by
+this point with every instance the user wrote in this module, including
+deriving-clause instances and standalone-deriving instances.
+
+## Classes
+
+Source: `tcg_tcs :: [TyCon]` filtered by `tyConClass_maybe`. For each
+`Class` the schema records:
+
+| Schema field | GHC API |
+|-------------------|-----------------------------------------|
+| `ciName` | `className cls` — the class's `Name` |
+| `ciTyVars` | `classTyVars cls` |
+| `ciSuperclasses` | `classSCTheta cls`, decomposed below |
+| `ciAssocTypes` | `classATs cls` — the assoc family TyCons|
+| `ciMethods` | `classMethods cls` (only the names) |
+| `ciSrc` | `nameSrcSpan (className cls)` |
+
+Superclass theta needs a tiny bit of decomposition because GHC
+represents some constraint shapes structurally:
+
+- We use `classifyPredType :: PredType -> Pred` to split every
+ superclass into a (`ClassPred`, `EqPred`, `IrredPred`, `ForAllPred`).
+- `ClassPred (cTupleClass _) [a, b, ...]` is a constraint tuple — we
+ recurse into each `a`, `b` and emit one `SuperclassEdge` per element.
+ This is needed because GHC sometimes hands us a single tuple-shaped
+ pred where the user wrote `class (Foo a, Bar a) => …`.
+- The boxed equality classes `(~)` and `(~~)` are technically
+ `ClassPred`s (they really are classes in GHC's type theory), but for
+ rendering we recognise them by their `OccName` and emit them as
+ equality predicates with two operands rather than as edges to a
+ synthetic `~` node.
+- For all other class predicates we emit one edge.
+
+Args of class predicates run through a `visibleArgs` filter
+(`tyConBinders` + `isInvisibleTyConBinder`), so kind / runtime-rep
+arguments don't leak into the rendered superclass list.
+
+## Class instances
+
+Source: `tcg_insts :: [ClsInst]`. Each `ClsInst` becomes one
+`InstanceInfo`:
+
+| Schema field | GHC API |
+|----------------|------------------------------------------------------|
+| `iiClass` | from `instanceSig inst` |
+| `iiArgs` | from `instanceSig inst` (the head args) |
+| `iiContext` | from `instanceSig inst` (the theta), decomposed |
+| `iiTyVars` | from `instanceSig inst` |
+| `iiOrphan` | `isOrphan (is_orphan inst)` |
+| `iiOverlap` | pretty-print of `is_flag inst` |
+| `iiSrc` | `nameSrcSpan (varName (is_dfun inst))` |
+
+`instanceSig :: ClsInst -> ([TyVar], [Type], Class, [Type])` returns the
+fully resolved (`tvs`, `theta`, `cls`, `args`) for the instance's
+dictionary function. That theta is whatever GHC stored on the dfun
+after typechecking. So the context we display is exactly what the
+solver would require to use the instance.
+
+Two non-obvious choices:
+
+- **`iiSrc` from the dfun, not the class.** The dfun's `Name` is tagged
+ with the source location of the `instance …` declaration GHC
+ synthesised it from. The class's `Name` only has a useful span when
+ the class was defined in a module compiled in the *current* package —
+ for classes loaded from another package's `.hi` file the span is
+ `UnhelpfulSpan` (interface files don't preserve source locations).
+ Using the dfun span means "Defined at" works uniformly.
+
+- **`iiContext` decomposition** uses the same `classifyPredType` +
+ CTuple / equality logic as superclass theta, so a user-written
+ `instance (Foo a, Bar a) => Baz a` produces two distinct context
+ predicates rather than one tuple-shaped one.
+
+## Type and data families
+
+Source: `tcg_tcs` filtered by `isFamilyTyCon`, plus the union with
+every class's `classATs` (associated families belong to a class but
+aren't always also in `tcg_tcs` directly). Per family:
+
+| Schema field | GHC API |
+|-----------------|--------------------------------------------------|
+| `tfName` | `tyConName tc` |
+| `tfTyVars` | `tyConTyVars tc` |
+| `tfFlavor` | from `famTyConFlav_maybe tc` and the class index |
+| `tfResultKind` | `tyConResKind tc`, pretty-printed |
+| `tfSrc` | `nameSrcSpan (tyConName tc)` |
+| `tfEquations` | for closed families: branches of the `CoAxiom` |
+
+Flavor taxonomy:
+
+- `OpenSynFamilyTyCon` → `OpenFam`
+- `ClosedSynFamilyTyCon (Maybe (CoAxiom Branched))` → `ClosedFam`,
+ with the branches extracted into `tfEquations`
+- `BuiltInSynFamTyCon{}` → `ClosedFam` (treated as closed for our purposes)
+- `DataFamilyTyCon{}` → `DataFam`
+- Anything found under some class's `classATs` → `AssocFam parentClass`
+
+Closed-family equations don't live in `tcg_fam_insts` — they're
+`CoAxBranch`es inside the family's `CoAxiom Branched`. We extract them
+via `coAxBranches (co_ax_branches ax)` and treat each branch as a
+`FamInstInfo` for rendering purposes.
+
+## Type family instances
+
+Source: `tcg_fam_insts :: [FamInst]`. Each becomes one `FamInstInfo`:
+
+| Schema field | GHC API |
+|---------------|------------------------------------------------------------|
+| `fiFamily` | `fi_fam fi` |
+| `fiTyVars` | `fi_tvs fi` |
+| `fiArgs` | `fi_tys fi`, filtered through `visibleArgs (famInstTyCon fi)` |
+| `fiRhs` | `fi_rhs fi` |
+| `fiSrc` | `coAxBranchSpan (coAxiomSingleBranch (fi_axiom fi))` |
+| `fiIsData` | `fi_flavor fi == DataFamilyInst _` |
+
+A few extraction choices worth documenting:
+
+- **Visible-arg filter on `fi_tys`.** GHC stores the kind args of a
+ poly-kinded family alongside the value args. For
+ `data family Wrap (l :: K)` we'd otherwise get
+ `Wrap (((TYPE (BoxedRep Lifted)) -> …) l)` instead of `Wrap l`.
+ `tyConBinders (famInstTyCon fi)` tells us which positions are
+ invisible binders; we drop those.
+
+- **`fiSrc` from the axiom branch, not the family name.** Each
+ fam-instance has its own `CoAxiom Unbranched` representing the
+ equation; `coAxBranchSpan` of its single branch points at the
+ `data instance` / `type instance` declaration in the user's source.
+ The family's `Name` would point at the `data family` / `type family`
+ declaration, which is wrong for every instance after the first.
+
+- **Data-family RHS is intentionally hidden in the viewer.** GHC
+ synthesises a fresh data-constructor TyCon for every
+ `data instance`, with a name like `R:Foo[List]Intmk`. That name has
+ no source-level meaning, so we only record it in the schema (for
+ completeness) and the viewer drops it from labels when `fiIsData`.
+
+## Type-family resolution in the viewer
+
+So far everything has been *transcription*: we record what GHC told us
+and stop. The one place we go beyond transcription is the **resolution
+chains** in the instance view. This part is computed in JavaScript
+against the already-extracted JSON, *not* by GHC.
+
+Setup. For an instance view focused on a class `C`, we draw — among
+other things — every superclass requirement of `C`'s declaration with
+the focused-instance's args substituted in. Concretely, for each
+superclass edge `seSuperclass = S, seArgs` of `C`, and each instance
+`inst` of `C`, the constraint is `S (subst seArgs inst.iiArgs)`.
+
+That `subst seArgs inst.iiArgs` may produce something with a
+`FamilyApp` in it, e.g. `S (F a) (F b)` where `F` is some type family.
+The viewer then asks "for which fam-instances of `F` does this pred
+become solvable, and which class instance of `S` would satisfy it?".
+
+The algorithm:
+
+1. **Find the family applications.** `collectFamilyRefs` walks the
+ substituted args and collects every `FamilyApp` qualname.
+
+2. **For each fam-instance `fi` of that family:**
+
+ 1. **Bidirectional unification** between `fi.fiArgs` and the
+ `FamilyApp`'s args. `biUnify` allows TyVarRefs on either side to
+ bind to anything on the other side. *If unification fails, this
+ `fi` is irrelevant — skip it entirely.* (This is what stops e.g.
+ a `Foo Int`-shaped predicate from being matched against a
+ `type instance F Bool = …` equation.)
+
+ 2. **Substitute `fi.fiRhs` into the predicate.** The bidirectional
+ unification gave us substitutions for both fi's tyvars (used to
+ rewrite fi's RHS) and the predicate's tyvars (used to specialise
+ the rest of the predicate). The result is a fully-substituted
+ predicate with no occurrence of the originating family.
+
+ 3. **Reduce the result.** `reduceTypeArg` walks the resulting
+ `TypeArg` and tries to apply *any* available fam-instance
+ (using one-sided unification — predicate vars stay free) to
+ collapse remaining family applications. This handles chained
+ reductions like `F (G x)` where `F` and `G` are both reducible.
+
+ 4. **Find a matching class instance.** `findMatchingInstances`
+ scans `pdInstances` for an instance of the predicate's class
+ whose `iiArgs` shape-match the reduced args (TyVarRefs on
+ either side acting as wildcards).
+
+ 5. **Draw the chain.** Edge from focused instance → family node →
+ `fi` node → matched class instance.
+
+If any step fails — relevance check, no fam-instance applies, no
+matching class instance exists in the dump — the chain stops at the
+last node we can justify.
+
+## Are the resolution chains deterministic?
+
+For the **transcribed data** (classes, instances, type families,
+fam-instances, contexts, superclass thetas, source spans, flags) the
+answer is unambiguous: it's exactly what GHC observed at the end of
+typechecking. No reconstruction, no heuristic, no overlap resolution
+on our side. If you re-run the plugin against the same source with the
+same GHC version you get bit-identical JSON.
+
+For the **resolution chains in the instance view**, the picture is
+more nuanced. Here's what can and cannot vary:
+
+- **Open type families with overlapping instances.** GHC forbids
+ overlapping open-family equations, so for any concrete argument list
+ there is at most one applicable `type instance`. But the viewer
+ walks fam-instances by *order in `pdFamInstances`*, which depends on
+ the order modules were compiled. If you have `type instance F Int = …`
+ in `pkg-A` and `type instance F Bool = …` in `pkg-B`, both are
+ presented as alternative resolutions; the user sees both chains
+ because both are *possible* satisfiers — without further input
+ (`a := Int` vs `a := Bool`) GHC itself can't pick one. The viewer
+ is honest about that ambiguity rather than hiding it.
+
+- **Closed type families with branch order.** GHC's evaluation rule is
+ "first matching branch wins". The viewer's `reduceTypeArg` mimics
+ this: it tries fam-instances in extraction order and stops at the
+ first that unifies. For closed families this is the same order as
+ the source-code branches, so the result matches GHC's. For open
+ families across multiple modules the order is build-graph-dependent
+ but, because open-family overlap is forbidden, the *set* of
+ applicable equations is unique even if the iteration order isn't.
+
+- **Ambiguous matches in `findMatchingInstances`.** Our class-instance
+ matcher treats TyVarRefs on either side as wildcards. That's
+ intentionally permissive: in the absence of a full constraint solver
+ we'd rather show the user "any of these instances might apply" than
+ hide instances behind a stricter check that would miss real
+ resolutions. The user sees every match as a separate chain edge and
+ decides which is meaningful in their context.
+
+- **Reductions stopping early.** If a fam-instance's RHS *itself*
+ mentions another type family (a common shape: `type Element (Wrap
+ c) = Element c` inside an associated family with the same name),
+ the viewer's reducer can't make further progress unless the
+ next-step fam-instance is also in the dumps. Chains terminate at
+ whichever fam-instance node the reducer last successfully unified
+ with — honest about the dependency boundary.
+
+- **Type-family functions defined by GHC itself** (e.g.
+ `Data.Type.Equality.==`, `If`, type-level arithmetic from
+ `GHC.TypeLits`) come through as `BuiltInSynFamTyCon` and are
+ transcribed but **not** evaluated by the viewer. We don't
+ reimplement GHC's built-in family rewrites. If a chain involves
+ `1 + 1` where `(+)` is `GHC.TypeNats.+`, the viewer will stop at
+ `+` rather than producing `2`.
+
+- **`UndecidableInstances` / cyclic reduction.** The reducer is
+ iterative, not fixed-point: each call visits the structure once and
+ then bails. We don't loop forever on a divergent chain. In practice
+ this means a circular family like `type instance F a = G a` /
+ `type instance G a = F a` shows up as two separate fam-instance
+ nodes pointing at each other via `fiRhs` references, but no edge in
+ the resolution-chain layer.
+
+So: *the data is deterministic and faithful, the chains are an honest
+best-effort visualization*. When in doubt, the chain edges in the
+instance view are hints; the underlying transcribed data
+(`pdInstances`, `pdFamInstances`, `tfEquations`, etc.) is ground
+truth.
+
+## What the viewer is *not* doing
+
+A few things we deliberately don't do, in case you wonder:
+
+- **No instance overlap resolution.** GHC's instance solver picks one
+ instance from a set of applicable ones using overlap pragmas
+ (`{-# OVERLAPPING #-}` etc.), specificity, and module-context rules.
+ We record `iiOverlap` as text on each instance but don't simulate
+ the solver. If two instances of the same class match a given
+ argument shape, the viewer shows both with the same chain edge.
+
+- **No constraint solving.** We don't try to compute, given a type, the
+ full set of class instances reachable through context constraints.
+ The instance view shows direct context predicates (`iiContext`) and
+ one hop of superclass requirements; transitive resolution is left
+ to the user clicking through.
+
+- **No quantified-constraint handling.** `forall a. Eq a => Eq (T a)`
+ appearing in a context is `ForAllPred` and we drop it from the
+ `iiContext` display. Quantified constraints are GHC-9-and-up but
+ rare; adding them would require rendering the implication shape.
+
+- **No analysis of `INCOHERENT` / `OVERLAPPABLE` / overlap warnings.**
+ We just show every instance.
+
+- **No re-typechecking when JSON is loaded.** The viewer trusts the
+ JSON. If you hand-edit `.classgraph/*.json`, the viewer will happily
+ render whatever's there — it has no way to detect inconsistency.
+
+If any of these matter for your use case and the answer is "we should
+do it", the place to start is `Classgraph.Extract` (for fields we'd
+need to add to the schema) or `viewer.js`'s `buildInstanceView` /
+`reduceTypeArg` / `biUnify` (for the in-viewer reconstruction).