> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mareforma.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Origin

> How Mareforma records where a claim came from: LLM reasoning, real analysis, or built on the graph. Distinct from how much to trust it.

Origin is orthogonal to trust. It records where a claim came from,
**how it was derived**, not how much it should be trusted.

An ANALYTICAL claim can be PRELIMINARY. An INFERRED claim can be ESTABLISHED.
They measure different things.

The API field is `classification`. This page explains what its values mean.

## The three origins

<AccordionGroup>
  <Accordion title="INFERRED (default)">
    LLM reasoning, synthesis, extrapolation. The default.

    Correct to use even for sophisticated reasoning, as long as it is not
    grounded in data that actually ran. If the model is drawing on training
    knowledge, synthesising across papers, or extrapolating from context,
    it is INFERRED.
  </Accordion>

  <Accordion title="ANALYTICAL">
    Deterministic analysis ran against source data and returned output.

    Only use this when a real data pipeline ran and produced real output.
    If the pipeline failed silently and the agent fell back to LLM knowledge,
    the classification is still INFERRED: asserting ANALYTICAL on null data
    is an epistemic lie that the graph will permanently record.
  </Accordion>

  <Accordion title="DERIVED">
    Explicitly built on ESTABLISHED or REPLICATED claims already in the graph.

    The `supports[]` field must point to those claims. A DERIVED claim with
    empty `supports[]` is unverifiable: the graph cannot validate the chain.
  </Accordion>
</AccordionGroup>

## Why this matters

The origin captures the difference between two claims that look
identical as text but represent fundamentally different epistemic situations:

```python theme={"dark"}
# Pipeline ran, real omics data queried, output confirmed
graph.assert_claim(
    "IL-21 is overexpressed in SLE CD4+ T cells",
    classification="ANALYTICAL",
    source_name="medeadb",
)

# Pipeline failed silently — LLM answered from training knowledge
graph.assert_claim(
    "IL-21 is overexpressed in SLE CD4+ T cells",
    classification="INFERRED",  # honest
)
```

Both claims assert the same text. Only the origin reveals that one
is grounded in data and one is not.

## The ANALYTICAL lie

The most dangerous misuse of Mareforma is asserting `ANALYTICAL` when the
data pipeline returned null and the agent fell back to LLM knowledge. The
graph records this permanently: future agents may build on it, reviewers
may validate it, and the epistemic chain will be wrong at the root.

The rule: if you did not run deterministic code against real data and
receive real output, the classification is `INFERRED`. Even if the answer
looks right.

## DERIVED: building on the graph

```python theme={"dark"}
# Two independent agents established a finding
prior_id = "3f8a1b2c-..."  # REPLICATED claim

# A synthesiser builds explicitly on top
synthesis_id = graph.assert_claim(
    "Given the replicated finding, the likely mechanism is ...",
    classification="DERIVED",
    supports=[prior_id],
    generated_by="agent/synthesiser",
)
```

DERIVED claims make the inference chain explicit and traversable.
