Skip to main content
All persistent state lives in a single SQLite file at <project_root>/.mareforma/graph.db (WAL mode, ACID). Schema version: 1.

Claims table

FieldTypeDefaultNullableDescription
claim_idTEXTUUID at insertNoPrimary key
textTEXTrequiredNoThe falsifiable assertion. Hard cap 100,000 chars. Sanitized on write (zero-width / bidi / control chars stripped)
classificationTEXTINFERREDNoEpistemic origin: INFERRED / ANALYTICAL / DERIVED
support_levelTEXTPRELIMINARYNoGraph-derived trust: PRELIMINARY / REPLICATED / ESTABLISHED. Transitions enforced by BEFORE triggers
idempotency_keyTEXTNULLYesUNIQUE: same key → same claim_id, no INSERT
validated_byTEXTNULLYesCosmetic display label set by graph.validate() (authoritative identity lives in validation_signature)
validated_atTEXTNULLYesUTC ISO 8601 timestamp of validation
statusTEXTopenNoEditorial: open / contested / retracted
source_nameTEXTNULLYesData source name; required for ANALYTICAL to be meaningful
generated_byTEXTagentNoAgent identifier; independence signal for REPLICATED
supports_jsonTEXT[]NoJSON array of upstream claim_ids or DOIs
contradicts_jsonTEXT[]NoJSON array of claim_ids this finding contests
comparison_summaryTEXTNULLYesHuman-readable diff note for contradictions. Editable on signed claims (not part of signed payload)
branch_idTEXTmainNoReserved for future branching; currently always main
unresolvedINTEGER0No1 when a DOI in supports[]/contradicts[] could not be HEAD-checked against Crossref/DataCite. Blocks REPLICATED promotion until refresh_unresolved() clears it
signature_bundleTEXTNULLYesDSSE v1 envelope wrapping an in-toto Statement v1 payload (predicateType=urn:mareforma:predicate:claim:v1). The signature covers the DSSE Pre-Authentication Encoding (PAE), not the payload bytes directly, so a signature on one payloadType cannot be replayed as a signature on another
transparency_loggedINTEGER1No1 when the claim has been (or did not need to be) submitted to Sigstore-Rekor. Blocks REPLICATED until refresh_unsigned() flips it. Restore downgrades a TOML-asserted 1 to 0 when the bundle has no rekor block; hand-edited TOML cannot fake a Rekor inclusion
validation_signatureTEXTNULLYesSigned (claim_id, validator_keyid, validated_at) envelope. CHECK constraint requires this on every ESTABLISHED row
validator_keyidTEXTNULLYesDenormalized signer keyid from validation_signature for indexable reputation aggregation
artifact_hashTEXTNULLYesSHA-256 hex digest of the output bytes (figure, CSV, model) backing the claim. Bound into the signed payload; gates REPLICATED when both peers supply a hash
prev_hashTEXTNULLYesAppend-only chain link sha256(prev_chain_link || canonical_statement_bytes). The chain input is the same bytes the signature covers, so chain integrity and signature integrity move together. UNIQUE partial index catches branched chains
ev_risk_of_biasINTEGER0NoGRADE downgrade domain in [-2, 0]. Methodological flaws (allocation, blinding, attrition). CHECK constraint bounds the value
ev_inconsistencyINTEGER0NoGRADE downgrade domain in [-2, 0]. Heterogeneity of effect across studies
ev_indirectnessINTEGER0NoGRADE downgrade domain in [-2, 0]. Population / intervention / outcome mismatch
ev_imprecisionINTEGER0NoGRADE downgrade domain in [-2, 0]. Wide CIs / small N
ev_pub_biasINTEGER0NoGRADE downgrade domain in [-2, 0]. Selective reporting / file-drawer effect
evidence_jsonTEXT{}NoFull GRADE EvidenceVector serialized as canonical JSON: the five downgrade domains plus upgrade flags (large_effect, dose_response, opposing_confounding), rationale dict (required for any nonzero domain), and reporting_compliance list. Bound into the signed Statement; denormalized into the ev_* columns for queryable filters
statement_cidTEXTNULLYesContent identifier of the signed in-toto Statement: sha256(canonicalize(statement)) hex. Restore re-derives this from the row’s fields + evidence_json and refuses any mismatch with the stored value
t_invalidINTEGERNULLYesInvalidation timestamp set by the contradiction_invalidates_older trigger when a signed contradiction_verdicts row references this claim. Default query() / search() excludes invalidated rows; pass include_invalidated=True for audit-mode listings
convergence_retry_neededINTEGER0No1 when _maybe_update_replicated swallowed a SQLite error during the post-INSERT promotion check. EpistemicGraph.refresh_convergence() walks flagged rows to retry detection and clear the flag. Round-trips through claims.toml so the operator’s audit list survives restore
created_atTEXTUTC nowNoISO 8601 UTC timestamp
updated_atTEXTUTC nowNoISO 8601 UTC; updated on every mutation
Row-level CHECK: every row whose support_level = 'ESTABLISHED' must have a non-NULL validation_signature. The CHECK is the row-level belt to the trigger’s transition-level suspenders.

Indexes

IndexColumn(s)TypeNotes
idx_claims_statusstatusNon-uniqueFilters by open, contested, retracted
idx_claims_sourcesource_nameNon-uniqueFilters by data source
idx_claims_generated_bygenerated_byNon-uniqueFilters by agent
idx_claims_support_levelsupport_levelNon-uniqueFilters by trust tier
idx_claims_unresolvedunresolvedNon-uniqueAccelerates refresh_unresolved()
idx_claims_transparency_loggedtransparency_loggedNon-uniqueAccelerates refresh_unsigned()
idx_claims_artifact_hashartifact_hashUnique (partial)WHERE artifact_hash IS NOT NULL: only rows that opt in to the gate
idx_claims_idempotency_keyidempotency_keyUnique (partial)WHERE idempotency_key IS NOT NULL
idx_claims_prev_hashprev_hashUnique (partial)WHERE prev_hash IS NOT NULL: catches branched chains from concurrent writers or manual SQL tamper
idx_claims_validator_keyidvalidator_keyidNon-unique (partial)WHERE validator_keyid IS NOT NULL: reputation aggregation
idx_replication_clustercluster_idNon-uniqueOn replication_verdicts
idx_replication_membermember_claim_idNon-uniqueOn replication_verdicts
idx_contradiction_membermember_claim_idNon-uniqueOn contradiction_verdicts

State-machine triggers

Two BEFORE triggers enforce the support-level state machine at the storage layer. Defense in depth: a tampered Python interpreter cannot relax these rules. claims_insert_state_check: rejects:
  • support_level outside {PRELIMINARY, ESTABLISHED} (REPLICATED can only be reached via UPDATE)
  • support_level = 'ESTABLISHED' without a validation_signature (only the seed-claim path satisfies this, since seeds carry a signed seed envelope)
  • support_level = 'PRELIMINARY' with validated_by or validated_at set
claims_update_state_check: rejects:
  • PRELIMINARY → ESTABLISHED (must pass through REPLICATED first)
  • REPLICATED → PRELIMINARY
  • Any transition out of ESTABLISHED
  • → ESTABLISHED without a validation_signature
Trigger errors carry mareforma:state:<from>-><to> codes that Python translates to IllegalStateTransitionError. claims_update_status_terminal: retracted is terminal. Any UPDATE that transitions a row out of retracted raises mareforma:state:retracted_is_terminal. To resurrect a withdrawn finding, assert a new claim citing the old via contradicts=[<old_claim_id>]. claims_signed_fields_no_laundering: append-only over the signed predicate. Refuses any direct-SQL UPDATE that changes a signed-predicate value (text, classification, generated_by, supports_json, contradicts_json, source_name, artifact_hash, ev_*, evidence_json, statement_cid, prev_hash, created_at) on a row whose signature_bundle IS NOT NULL. Value-comparison fires only when something actually changed, so multi-column UPDATEs that re-emit unchanged values (e.g. status-only edits via update_claim) pass through. Raises mareforma:append_only:signed_field_locked. contradiction_invalidates_older: AFTER INSERT on contradiction_verdicts. Sets claims.t_invalid = NEW.created_at on the older of the two referenced claims (lex-smaller claim_id as the deterministic tie-break when timestamps collide), idempotent via WHERE t_invalid IS NULL. replication_verdicts_append_only + replication_verdicts_no_delete: UPDATE on the signed columns and any DELETE both raise mareforma:append_only:verdict_locked / verdict_delete_blocked. Same for contradiction_verdicts via the symmetric pair.

Valid values

VALID_CLASSIFICATIONS = ("INFERRED", "ANALYTICAL", "DERIVED")
VALID_SUPPORT_LEVELS  = ("PRELIMINARY", "REPLICATED", "ESTABLISHED")
VALID_STATUSES        = ("open", "contested", "retracted")
Available at runtime via mareforma.schema():
s = mareforma.schema()
s["classifications"]  # ['INFERRED', 'ANALYTICAL', 'DERIVED']
s["support_levels"]   # ['PRELIMINARY', 'REPLICATED', 'ESTABLISHED']
s["statuses"]         # ['open', 'contested', 'retracted']
s["schema_version"]   # 1

replication_verdicts table

Signed replication verdicts produced by enrolled validators. The OSS core accepts verdicts from any enrolled identity; the predicates that GENERATE verdicts (semantic-cluster, cross-method, hash-match, shared-resolved-upstream) live outside the OSS and call Graph.record_replication_verdict() to write here. Append-only at the SQL layer: UPDATE on signed columns and DELETE are both refused by triggers.
FieldTypeDescription
verdict_idTEXT (PK)Caller-supplied unique id
cluster_idTEXTCaller-supplied cluster identifier shared across all verdicts in one replication cluster
member_claim_idTEXTFK to claims(claim_id). The claim being asserted as replicated
other_claim_idTEXTFK to claims(claim_id). Optional second member of the pair (NULL for single-row cross-method verdicts)
methodTEXTCHECK enum: hash-match / semantic-cluster / shared-resolved-upstream / cross-method
confidence_jsonTEXTCanonical JSON of the confidence dict (e.g. {"cosine":0.91,"nli_forward":0.88}), never fused into a single score
issuer_keyidTEXTFK to validators(keyid). The signing validator’s keyid
signatureBLOBEd25519 signature over the DSSE PAE of the canonical payload (payloadType=application/vnd.mareforma.replication-verdict+json)
created_atTEXTUTC ISO 8601
Side effect: recording a replication verdict promotes the referenced claims from PRELIMINARY to REPLICATED (only if still PRELIMINARY AND status='open' AND t_invalid IS NULL). INSERT + promotion run in a single BEGIN IMMEDIATE transaction so a concurrent contradiction cannot land between the two writes.

contradiction_verdicts table

Signed contradiction verdicts. Same shape as replication_verdicts but binds a refutation between two claims. INSERT fires the contradiction_invalidates_older trigger which sets t_invalid on the older referenced claim.
FieldTypeDescription
verdict_idTEXT (PK)Caller-supplied unique id
member_claim_idTEXTFK to claims(claim_id)
other_claim_idTEXTFK to claims(claim_id). CHECK constraint refuses member == other (self-contradiction is meaningless and would let a single validator invalidate any claim unilaterally)
confidence_jsonTEXTCanonical JSON of the confidence dict
issuer_keyidTEXTFK to validators(keyid)
signatureBLOBDSSE-PAE Ed25519 signature (payloadType=application/vnd.mareforma.contradiction-verdict+json)
created_atTEXTUTC ISO 8601

rekor_inclusions table

Sidecar recording every successful Sigstore-Rekor submission, written by _record_rekor_inclusion as step 3 of the Rekor saga. Step 4 (the claims-row UPDATE that attaches the Rekor coords to signature_bundle) reads from this table on retry instead of re-submitting, so a single Rekor submission produces exactly one log entry even when the local UPDATE crashes mid-saga. Append-only at the SQL layer: both UPDATE and DELETE are refused by triggers, mirroring the verdict-table protections. The saga’s write uses INSERT ON CONFLICT(claim_id) DO NOTHING, so a legitimate retry on the same claim_id is a silent no-op (the original row is preserved) and a SQL-writer cannot launder forged Rekor coords through the recovery path in refresh_unsigned().
FieldTypeDescription
claim_idTEXT (PK)FK to claims(claim_id). One Rekor inclusion per claim
uuidTEXTThe Rekor log entry’s UUID
log_indexINTEGERThe Rekor log entry’s sequence index
integrated_timeINTEGERUnix timestamp of inclusion (NULL if Rekor omitted it)
raw_response_b64TEXTBase64-encoded full Rekor response JSON, preserved so the recovery path can re-attach byte-identical coords to the augmented signature bundle
recorded_atTEXTUTC ISO 8601 of the local write

doi_cache table

Persistent cache of DOI resolution results to avoid repeated network calls.
FieldTypeDescription
doiTEXT (PK)Normalised DOI
resolvedINTEGER1 if Crossref or DataCite confirmed it
registryTEXTcrossref / datacite / NULL
last_checked_atTEXTUTC ISO 8601; TTLs are 30 days for resolved, 24 hours for unresolved

validators table

The per-project set of public keys allowed to promote claims to ESTABLISHED.
FieldTypeDescription
keyidTEXT (PK)SHA-256 of the Ed25519 raw public bytes
pubkey_pemTEXTEd25519 public key in PEM form
identityTEXTDisplay label (256-char cap, control characters and display-spoofing forms rejected)
enrolled_atTEXTUTC ISO 8601
enrolled_by_keyidTEXTParent validator (root rows have enrolled_by_keyid = keyid)
enrollment_envelopeTEXTDSSE-style envelope signed by the parent. Verifying is_enrolled walks the chain back to a self-signed root
The first key opened against a fresh graph.db auto-enrolls as the root with a self-signed envelope (BEGIN IMMEDIATE guards against two simultaneous opens both becoming roots). The chain walk enforces a singleton-root invariant: if two rows have keyid == enrolled_by_keyid, neither is trusted. Walk is capped at 64 hops. Removal is intentionally unsupported currently; validator history is append-only.

Trust layer tables

The trust layer (see Findings) adds six tables for structured findings. They are additive: CREATE TABLE IF NOT EXISTS, no migration, user_version stays 1. A finding is an evidence tree (finding → evidence_lines → contrasts → effect_estimates), anchored to a content-addressed proposition and a pre-registered prediction, and attested by an existing signed claim.
proposition (content_id)
   ├── prediction (plan_id)        the pre-registered rule
   └── finding (finding_id) ──► claim_id        the signed attestation
          └── evidence_line (line_id, data_id)
                 └── contrast (contrast_id)
                        └── effect_estimate (estimate_id)

propositions table

The content-addressed unit of sameness. content_id (PK) is the answer hash; frame_id is the question hash.
FieldTypeNotes
content_idTEXT (PK)sha256 over normalized (subject, relation, object, scope, direction, magnitude)
frame_idTEXTQuestion hash (direction + magnitude dropped). Indexed
subject / relation / objectTEXTThe typed parts of the assertion
directionTEXTCHECK enum INCREASES / DECREASES / NO_EFFECT / PRESENT / ABSENT (UNSPECIFIED is never stored)
scope_jsonTEXTCanonical JSON of the scope map
magnitudeTEXTOptional quantitative refinement; participates in content_id
content_id_policyTEXTIdentity-policy stamp, default content_id@v1
schema_versionTEXTTrust-schema stamp, default trust@v1
created_atTEXTUTC ISO 8601
Indexes: idx_prop_frame (frame_id), idx_prop_frame_dir (frame_id, direction).

predictions table

The pre-registered plan, bound to one proposition. plan_id (PK) is content-addressed over (content_id, prediction fields), so registering the same plan twice is a no-op.
FieldTypeNotes
plan_idTEXT (PK)Content-addressed plan id
content_idTEXTFK to propositions(content_id). Indexed (idx_pred_content)
inference_regimeTEXTCHECK enum frequentist
test_typeTEXTCHECK enum superiority / equivalence
direction_of_interestTEXTCHECK enum increase / decrease (superiority only)
equivalence_lower / equivalence_upperREALEquivalence region (TOST only)
alphaREALCHECK alpha > 0 AND alpha < 1
preregisteredINTEGERCHECK IN (0, 1)
registered_atTEXTUTC ISO 8601
A registered plan is append-only: predictions_append_only (BEFORE UPDATE of every immutable column) and predictions_no_delete (BEFORE DELETE) raise mareforma:append_only:prediction_locked / prediction_delete_blocked, so the gap between registration and evidence is a real pre-registration guarantee.

findings table

One attestation plus its computed bearing on a proposition under a plan.
FieldTypeNotes
finding_idTEXT (PK)UUID
content_idTEXTFK to propositions(content_id). Indexed (idx_find_content)
plan_idTEXTFK to predictions(plan_id)
claim_idTEXTFK to claims(claim_id), the signed attestation
bearing_directionTEXTNOT NULL CHECK enum supports / refutes / neutral. Computed by the gate, denormalized here for Status counting
created_atTEXTUTC ISO 8601

evidence_lines table

One line of evidence; a finding may carry several. Independence is counted by distinct run (generated_by), with data_id as a guard so the same dataset is not counted twice.
FieldTypeNotes
line_idTEXT (PK)UUID
finding_idTEXTFK to findings(finding_id). Indexed (idx_line_finding)
data_idTEXTDataset key; guards against counting the same dataset twice across runs. Indexed (idx_line_data)
modality / provenance_id / design_typeTEXTOptional descriptors
created_atTEXTUTC ISO 8601

contrasts table

The comparison a line quantifies (control type only, for now).
FieldTypeNotes
contrast_idTEXT (PK)UUID
line_idTEXTFK to evidence_lines(line_id)
control_typeTEXTCHECK enum positive / negative / vehicle / sham / comparative

effect_estimates table

The estimate the gate reads. Minimal metafor-named field set.
FieldTypeNotes
estimate_idTEXT (PK)UUID
contrast_idTEXTFK to contrasts(contrast_id). Indexed (idx_estimate_contrast)
estimate_valueREALThe point estimate
effect_typeTEXTCHECK enum of 13 metafor measure values (SMD, OR, logOR, RR, HR, ROM, …)
scaleTEXTCHECK enum raw / log
p_valueREALCHECK NULL OR (0 ≤ p_value ≤ 1)
ci_lower / ci_upper / ci_levelREALConfidence interval (all-or-none)
n_totalINTEGEROptional total N
The Status of a proposition is not stored. It is derived on read from the independent supporting / refuting line counts, counted by distinct run (status_policy@v2), so improving the rule later is a new policy over the same data, not a migration.

Schema versioning

The schema version is stored in SQLite’s user_version pragma.
VersionState
0Fresh database: full schema applied on first open_db()
1the current schema
Any otherDatabaseError: delete graph.db to start fresh; claims.toml is the backup
No in-place migrations in this release. Adding a column, index, or trigger means updating the schema in place; existing development databases get the schema-validation error and the operator deletes graph.db (claims.toml is the restore artifact). Versioned migrations become relevant only after a 1.0 release establishes a stable schema with real users on it.

Storage

graph.db is stored at <project_root>/.mareforma/graph.db. Created automatically on first mareforma.open(). The .mareforma/ directory is created if it does not exist. claims.toml at the project root is a human-readable backup of all claims, written after every mutation. It is not the source of truth (graph.db is) but it survives graph.db deletion.

Runtime PRAGMAs

open_db() sets these connection-level PRAGMAs on every open:
PRAGMAValueWhy
journal_modeWALConcurrent readers + one writer without blocking
foreign_keysONSQLite default is OFF. The replication_verdicts, contradiction_verdicts, and rekor_inclusions tables have FK references to validators(keyid) and claims(claim_id); without this PRAGMA, those constraints would be advisory and a direct-SQL INSERT with a fabricated keyid would silently succeed