From: Javier Sagredo Date: Thu, 7 May 2026 01:07:03 +0000 (+0200) Subject: Refresh INTERNALS.md for today's data-family + resolution work X-Git-Url: https://git.sagredo.dev/?a=commitdiff_plain;h=d8c99ff73b6cb908584fdab5db689b5a56a5ce1e;p=classgraph.git Refresh INTERNALS.md for today's data-family + resolution work * Type family instances section: documents the data-family R: rewrite (rep tyvars substituted with use-site args), the resulting fiRhs being structurally equal to the LHS (and the reducer guarding against the loop), the famInstTyCon-vs-rep_tc quirk, and the new fiDataCons/DataConInfo extraction. * Resolution algorithm: adds the per-pivot reduceTypeArg step in replaceFamilyApp (today's fix that lets nested-family use sites like F (G a) finally unify with concrete fi candidates), the fall-back chain edge to the predicate node when no class instance matches, and renumbers the surrounding steps. * Predicate-node placeholders: documents the empty-args use-site skip (no placeholder for unapplied-family references like `class HasTables l` with `HasTables LedgerState` in the context), and the side panel's two-reason explanation of why a placeholder fired. Co-Authored-By: Claude Opus 4.7 (1M context) --- diff --git a/docs/INTERNALS.md b/docs/INTERNALS.md index 72cd3b6..f73bdcb 100644 --- a/docs/INTERNALS.md +++ b/docs/INTERNALS.md @@ -193,9 +193,11 @@ Source: `tcg_fam_insts :: [FamInst]`. Each becomes one `FamInstInfo`: | `fiFamily` | `fi_fam fi` | | `fiTyVars` | `fi_tvs fi` | | `fiArgs` | `fi_tys fi`, filtered through `visibleArgs (famInstTyCon fi)` | -| `fiRhs` | `fi_rhs fi` | +| `fiRhs` | `fi_rhs fi` (note: for data fam-instances this is the synthetic R: TyCon, which we then rewrite — see below) | | `fiSrc` | `coAxBranchSpan (coAxiomSingleBranch (fi_axiom fi))` | | `fiIsData` | `fi_flavor fi == DataFamilyInst _` | +| `fiDataCons` | `tyConDataCons rep_tc`, where `rep_tc` is pulled out of `DataFamilyInst rep_tc` (see below) | +| `fiDefinedIn` | filled in by the merge step from the enclosing `ModuleDump`'s package id (after normalisation) | A few extraction choices worth documenting: @@ -213,11 +215,42 @@ A few extraction choices worth documenting: The family's `Name` would point at the `data family` / `type family` declaration, which is wrong for every instance after the first. -- **Data-family RHS is intentionally hidden in the viewer.** GHC - synthesises a fresh data-constructor TyCon for every - `data instance`, with a name like `R:Foo[List]Intmk`. That name has - no source-level meaning, so we only record it in the schema (for - completeness) and the viewer drops it from labels when `fiIsData`. +- **Data-family RHS is rewritten to the abstract family form.** GHC + represents `data instance Foo Args = …` internally as a synthetic + `R:FooArgs` TyCon. When `R:FooArgs` shows up inside a constraint + or instance head, the user reads it as `Foo Args` (the abstract + application). `Classgraph.Extract.typeArg` gates on + `tyConFamInst_maybe` *before* the regular `TyConApp` path: a TyCon + for which it returns `Just (parent, parentArgs)` is emitted as + `FamilyApp parent parentArgs'`, where `parentArgs'` is + `parentArgs` after substituting the rep TyCon's tyvars + (`tyConTyVars rep_tc`) with the use-site args from + `splitTyConApp_maybe`. Without that substitution, `parentArgs` + references the rep's *internal* tyvars and they leak through + `typeArg` as `OtherArg ""` because they're not in the + caller's `boundTvs`. + +- **Data-family RHS is *also* circular by construction.** The + rewrite above means the `fiRhs` of a data fam-instance is + structurally equal to its LHS (e.g. for `data instance Crate + Int = …`, `fiRhs` is `FamilyApp Crate [Int]`). The viewer hides + the RHS for data fam-instances in node labels (and in the side + panel — see below); `reduceTypeArg` skips `fiIsData` rows + entirely to avoid an infinite recursion that would otherwise + loop forever rewriting `Crate Int` → `Crate Int`. + +- **`fiDataCons` from `rep_tc`, not `famInstTyCon`.** For a data + fam-instance, `famInstTyCon` returned the family TyCon (not the + rep) on at least GHC 9.14.1, opposite to what the GHC docs + suggested. We pull the rep TyCon directly out of the + `DataFamilyInst rep_tc` constructor and call `tyConDataCons` on + it. Each `DataCon` becomes a `DataConInfo` with the constructor's + name, its arg types (as `TypeArg`s in the fam-instance's + `fiTyVars` scope), and its field labels (when record syntax is + used; otherwise an empty list). The viewer renders just the + constructor *names* on the node label (`Family args = ConA | + ConB`), and the full constructor declarations (with field types + / record fields) in the side panel. ## Type-family resolution in the viewer @@ -244,36 +277,65 @@ The algorithm: 2. **For each fam-instance `fi` of that family:** - 1. **Bidirectional unification** between `fi.fiArgs` and the - `FamilyApp`'s args. `biUnify` allows TyVarRefs on either side to - bind to anything on the other side. *If unification fails, this - `fi` is irrelevant — skip it entirely.* (This is what stops e.g. - a `Foo Int`-shaped predicate from being matched against a - `type instance F Bool = …` equation.) - - 2. **Substitute `fi.fiRhs` into the predicate.** The bidirectional + 1. **Reduce the pivot's sub-args first.** Before unifying we run + `reduceTypeArg` over the *FamilyApp pivot's sub-args*, so a + nested family application gets unwrapped where possible. For + a use site `F (G a)` with `type instance G _ = something`, + the pivot's args go from `[FamilyApp G [a]]` to + `[something]` — which is what gives `biUnify` a chance to + match `F`'s fam-instances. Without this step nested-family + use sites looked unresolvable even when the equations to make + them resolvable were right there. + + 2. **Bidirectional unification** between `fi.fiArgs` and the + (reduced) `FamilyApp`'s args. `biUnify` allows TyVarRefs on + either side to bind to anything on the other side. *If + unification fails, this `fi` is irrelevant — skip it + entirely.* (This is what stops e.g. a `Foo Int`-shaped + predicate from being matched against a `type instance F Bool + = …` equation.) The wildcard rule is intentionally + permissive — when the inner reduction unwraps to a `TyVarRef` + (an abstract tyvar in the use site), every concrete + data-instance of the outer family ends up matching it. The + user sees a chain edge per match; we deliberately don't try + to disambiguate further. + + 3. **Substitute `fi.fiRhs` into the predicate.** The bidirectional unification gave us substitutions for both fi's tyvars (used to rewrite fi's RHS) and the predicate's tyvars (used to specialise the rest of the predicate). The result is a fully-substituted predicate with no occurrence of the originating family. - 3. **Reduce the result.** `reduceTypeArg` walks the resulting + 4. **Reduce the result.** `reduceTypeArg` walks the resulting `TypeArg` and tries to apply *any* available fam-instance (using one-sided unification — predicate vars stay free) to collapse remaining family applications. This handles chained reductions like `F (G x)` where `F` and `G` are both reducible. + Data fam-instances are skipped here (their `fiRhs` is + structurally equal to the LHS post-rewrite, so naïve + recursion looped forever). - 4. **Find a matching class instance.** `findMatchingInstances` + 5. **Find a matching class instance.** `findMatchingInstances` scans `pdInstances` for an instance of the predicate's class whose `iiArgs` shape-match the reduced args (TyVarRefs on - either side acting as wildcards). + either side acting as wildcards). If one or more are found, + draw a `fam-resolves` edge from the `fi` node to each. + + 6. **Fall back to the predicate node.** If no class instance + matched (typically because the class's defining package + isn't in the dumps), draw a `fam-resolves` edge from the + `fi` node to the originating *predicate node* (the one + created from the context constraint or unmatched superclass). + Without this fallback the fam-instance ended up visually + orphaned even though we knew exactly which constraint it was + discharging. - 5. **Draw the chain.** Edge from focused instance → family node → - `fi` node → matched class instance. + 7. **Draw the chain.** Net result: focused instance → predicate + node → fam-instance(s) → class instance(s) (or back to the + predicate node if no class instance was found). -If any step fails — relevance check, no fam-instance applies, no -matching class instance exists in the dump — the chain stops at the -last node we can justify. +If the relevance check kills every `fi` of `fa` — see the next +section on placeholders for what happens in that branch. ## Are the resolution chains deterministic? @@ -481,6 +543,26 @@ A few things to know: visual language as the placeholder, so the chain reads as one continuous "outside this project" thread. +- **Unapplied-family use sites get no placeholder.** When a + constraint references a family with *no* args — passed as a + higher-kinded thing, e.g. `class HasTables l where …` and a + context `HasTables LedgerState` where `LedgerState` is a data + family of kind `Type -> MapKind -> Type` left unapplied — the + use-site args list is empty. A `LedgerState = ?` placeholder + would convey nothing beyond what the family node itself shows, + so the synthesizer skips empty-args use sites entirely. + Concretizing the unapplied form to e.g. `LedgerState + (ShelleyBlock proto era) mk` would require constraint-solver- + like propagation across the constraint set (matching `l ~ + LedgerState (ShelleyBlock proto era)` from elsewhere) — work + GHC does at typecheck time but we don't preserve in the dump. + +- **Side panel honestly explains why a placeholder fired.** The + panel for a placeholder fam-instance lists both common reasons + (external family vs. nested/abstract args we couldn't reduce + through) so the user knows which dump to add or which limitation + is in play, rather than silently looking unresolved. + ## What the viewer is *not* doing A few things we deliberately don't do, in case you wonder: