Skip to content

Object-Keyed, Hash-Pinned Ontology Vendoring Index

ADR-0002: Object-Keyed, Hash-Pinned Ontology Vendoring Index

Section titled “ADR-0002: Object-Keyed, Hash-Pinned Ontology Vendoring Index”

accepted

The ontology corpus is served at https://mif-spec.dev/ontologies/ from a static mirror generated by modeled-information-format/MIF’s scripts/snapshot-ontology-version.py. Its machine-readable catalog, index.json, is the entry point a consumer reads to discover and fetch an ontology. The flagship consumer is the research harness (modeled-information-format/research-harness-template), whose ADR-0012 vendors domain ontologies on demand: scripts/fetch-ontology.sh reads the index, resolves an ontology’s extends closure, downloads each layer, verifies its sha256 against the index fail-closed, materializes it under packs/ontologies/<id>/, and pins it in ontologies.lock.json; scripts/check-ontology-lock.sh then proves no drift.

For that contract to hold, the index must answer three questions per ontology, by key: what file to fetch, what its sha256 must be, and what it extends. The corpus authoring repo’s own scripts/gen-ontology-index.sh already emits exactly that shape — an object keyed by id, each entry {version, file, sha256, extends[]}.

The served index.json does not match. The MIF snapshot generator emits a discovery-oriented array of {id, version, canonical, yaml, versioned} with no integrity hash and no extends. A consumer that does index.ontologies[id] indexes an array with a string and fails; even corrected to a scan, there is no sha256 to verify against, so the fail-closed fetch cannot complete at all. The two index designs — the array the mirror serves and the object the harness and the authoring generator expect — were built separately and never reconciled. The result: on-demand vendoring cannot be adopted, which blocks the harness epic’s children #222 (a present, gate-clean lock) and #224 (flipping bundled packs to an on-demand cache).

  • When a consumer reads the index for an ontology id, the consumer shall obtain that ontology’s fetch file, its sha256, and its extends list by key lookup, with no scan and no external table.
  • When a consumer fetches an ontology layer, the consumer shall verify the downloaded bytes against an index-supplied sha256 and refuse on mismatch (fail-closed); the index shall therefore carry a per-entry integrity hash.
  • The served index and the authoring repo’s gen-ontology-index.sh output shall be the same shape, so the corpus has one index contract, not two.
  • When a person or a discovery tool reads the index, the entry shall still expose the canonical, yaml, and versioned URLs it exposes today.
  • The change shall keep the human index.html catalog and the snapshot --check gate working.

Option 1: Keep the served array index, no integrity hash

Section titled “Option 1: Keep the served array index, no integrity hash”

Leave snapshot-ontology-version.py emitting the array {id, version, canonical, yaml, versioned}.

  • Pro: No generator change; the discovery site and --check gate are untouched.
  • Con: The fail-closed fetch contract is unsatisfiable — no key lookup, no sha256, no extends. On-demand vendoring stays blocked indefinitely.
  • Technical: High. The published contract cannot serve its only machine consumer; the harness fetcher cannot run against it.
  • Schedule: Blocks the dependent harness epic (#222, #224) with no path forward.
  • Ecosystem: Two divergent index shapes persist across three repos.

Option 2: Object-keyed, hash-pinned index, discovery fields preserved (chosen)

Section titled “Option 2: Object-keyed, hash-pinned index, discovery fields preserved (chosen)”

Change snapshot-ontology-version.py so index.ontologies is an object keyed by id, each entry {version, file, sha256, extends[], canonical, yaml, versioned}. The file/sha256/extends fields satisfy the fetch contract (and match gen-ontology-index.sh); the canonical/yaml/versioned fields preserve discovery. The index.html builder iterates the object’s values instead of the array.

  • Pro: One index contract across the mirror, the authoring generator, and the harness fetcher. Fail-closed integrity is satisfiable by key lookup. Discovery URLs are retained, so no consumer loses information.
  • Con: It is a breaking change to a published catalog contract: any external reader of the array shape must update, and the MIF --check gate plus index.html builder must change in lockstep.
  • Technical: Low-medium. The shape is already proven by gen-ontology-index.sh; the work is porting it into the MIF snapshot generator and updating the two in-repo consumers (HTML builder, --check).
  • Schedule: Medium. Requires a coordinated MIF change + redeploy before the harness can flip; the redeploy is async (Pages CI).
  • Ecosystem: Medium. Breaking the published array shape affects any unknown external reader; mitigated by the corpus being pre-1.0 and the array index having no known integrity-bearing consumer.

Option 3: Keep the array index; weaken the consumer to trust-on-first-use

Section titled “Option 3: Keep the array index; weaken the consumer to trust-on-first-use”

Leave the served index as an array and change the harness fetcher to scan it and pin the sha256 it computes on first download (TOFU), dropping the index cross-check.

  • Pro: No MIF generator change; smallest surface.
  • Con: Downgrades the supply-chain posture from fail-closed index-cross-checked integrity to trust-on-first-use, removing the defense against a compromised mirror/CDN. The harness constitution marks fail-closed supply chain non-negotiable.
  • Technical: Low to build.
  • Schedule: Fast.
  • Ecosystem: Unacceptable posture downgrade on a published registry; pushes the integrity burden onto every consumer and weakens the guarantee for all of them.

Adopt Option 2. The served https://mif-spec.dev/ontologies/index.json becomes an object keyed by ontology id, each entry carrying the fetch fields {version, file, sha256, extends[]} alongside the existing discovery fields {canonical, yaml, versioned}. snapshot-ontology-version.py is changed to emit this shape and to compute each entry’s sha256 over the served *.ontology.yaml (the bytes the fetcher downloads and pins); its index.html builder iterates the object’s values; the snapshot --check gate validates the new shape. The shape matches the authoring repo’s gen-ontology-index.sh, so the corpus carries one index contract end to end, and the harness’s fail-closed fetch-ontology.sh / check-ontology-lock.sh work unchanged against it.

  • The fail-closed on-demand vendoring contract becomes satisfiable; the harness epic’s #222 (present, gate-clean lock) and #224 (on-demand cache flip) unblock.
  • One index shape across the mirror, the authoring generator, and every consumer — no second design to keep in sync.
  • Per-entry sha256 gives mirror/CDN-tamper detection at fetch time, not just post-vendor drift detection.
  • It breaks the published array contract: any external reader of the current shape must update, and the change must land with the MIF --check gate and index.html builder in one move or the build fails.
  • Completion depends on an async public redeploy of mif-spec.dev; the harness flip cannot be verified end to end until that deploy is live.
  • The discovery fields (canonical, yaml, versioned) are unchanged; only the container shape changes and the fetch fields are added.
  • Per-ontology version semantics are unchanged — each entry still reports the ontology’s own version, independent of the corpus release versions.

The decision meets its drivers. A consumer resolves fetch file, sha256, and extends by key lookup (primary driver one); the per-entry sha256 lets the fetcher verify downloaded bytes fail-closed (primary driver two); and the shape is the one gen-ontology-index.sh already emits, so the corpus has a single index contract (primary driver three). Discovery URLs are retained (secondary driver one), and the generator change updates the index.html builder and --check gate together (secondary driver two).

The residual cost — a breaking change to a published contract gated on an async redeploy — is mitigated by the corpus being pre-1.0, by the array index having no known integrity-bearing consumer, and by sequencing the rollout: land and deploy the MIF index change first, verify the live index.json, and only then flip the harness to fetch on demand.

  • modeled-information-format/MIF scripts/snapshot-ontology-version.py: the served-mirror + index.json generator changed by this decision.
  • scripts/gen-ontology-index.sh: this repo’s generator, already emitting the object+sha256+extends shape this decision standardizes on.
  • research-harness-template scripts/fetch-ontology.sh / scripts/check-ontology-lock.sh: the fail-closed consumer.

The contract is auditable in three places that must agree: the served mif-spec.dev/ontologies/index.json shape, this repo’s gen-ontology-index.sh output, and the harness fetcher’s reader. Any one diverging from the others is the signal that this decision has been violated.

  • 2026-06-30: Pending. Proposed. The served index.json is still the array shape; the MIF generator change and redeploy, and the harness flip, are not yet landed. Flip to Compliant once mif-spec.dev/ontologies/index.json serves the object+sha256 shape and the harness vendors against it fail-closed.