What is EEM?
External Epistemic Memory (EEM) is knowledge that lives outside the model, carries its justifications with it, and lets you understand how the system knows what it knows.
EEM is defined by three load-bearing properties: it is external (outside model parameters), epistemic (justified beliefs with truth values), and memory (persistent semantic knowledge). Each property is necessary. Together they distinguish EEM from every other approach to LLM knowledge management.
Three Properties
External
Knowledge lives outside model parameters, in a separate substrate. It survives compaction, model swaps, and session boundaries. It is separable, copyable, shareable, inspectable, editable, and auditable. Of these six properties, auditability is the most epistemically important — it makes "how do you know that?" answerable by justification chain traversal.
Epistemic
Not just facts but justified beliefs with truth values (IN/OUT), retraction cascades, contradiction records (nogoods), and derivation depth. This is what distinguishes EEM from RAG, which is external semantic memory but not epistemic.
Memory
Persistent structured knowledge in Tulving's semantic memory category — not ephemeral context. The knowledge persists across sessions, across models, and across time.
What EEM Replaces
EEM vs RAG
RAG is external semantic memory but not epistemic. It retrieves content by similarity but has no justification chains, truth values, retraction cascades, or contradiction tracking. EEM adds the epistemic layer that RAG lacks.
EEM vs Context Windows
Conversation history and context windows are ephemeral — lost at session boundaries, destroyed by compaction. EEM persists across sessions and model swaps. Context compaction destroys justification networks (quantified across 33 measured compaction events).
EEM vs Parametric Knowledge
In-parameter knowledge has no audit trail. EEM makes "how do you know that?" answerable by justification chain traversal.
EEM vs Self-Assessed Confidence
LLM self-assessed confidence does not track accuracy. Confirmed across 4 models: Sonnet r=0.198, Opus r=-0.182 (worse than random), Flash r=0.219, Pro r=0.121. Answer and confidence come from the same process — the same structural flaw as human overconfidence (Kahneman). EEM replaces "am I sure?" with "is this justified?" — shifting from unreliable confidence to auditable justification chains.
How It Works
Truth Maintenance System (TMS)
EEM is built on Doyle's (1979) Truth Maintenance System architecture: SL justifications with antecedents, propagation cascades, retraction cascades, and an exogenous problem-solver slot. The TMS substrate is content-agnostic by design.
Hybrid Architecture
The implementation is a hybrid TMS: symbolic TMS handles structure (justifications, propagation, cascades, backtracking, challenge/defend) while LLMs handle semantic operations (derive generates beliefs, review-beliefs critiques them, contradiction detection finds nogoods). Putting an LLM in the TMS problem-solver slot is what Doyle's architecture prescribes.
Key Mechanisms
- SL Justification — a node is IN when ALL antecedents are IN. Multiple justifications allowed — node stays IN if ANY justification is valid. Enables non-monotonic reasoning via outlist.
- Retraction Cascade — when a node goes OUT, all dependents whose justifications become invalid also go OUT, automatically and transitively. Retract one belief and the network figures out what else falls.
- Nogoods — a set of nodes that cannot all be IN simultaneously. When detected, dependency-directed backtracking traces backward through justification chains and retracts the responsible premise with fewest dependents (minimal disruption).
- Challenge/Defend — dialectical argumentation: challenging a node makes it go OUT. Defending neutralizes the challenge. Multi-level chains supported. Preserves the original argument unlike retract.
- Restoration — when a retracted node comes back IN, dependents are recomputed — no manual rederivation needed.
Derive-then-Review
Over-derive, then review catches errors, retraction cascades propagate corrections. Both roles overshoot: derive over-generates, review over-retracts. Working through candidate retractions is where insights hide. 13-37% of derived beliefs are retracted per review round — the system finds and removes its own errors.
Measured Results
Model Compensation
EEM compensates for model size: Sonnet+beliefs approximates Opus without beliefs. Haiku with dual-path achieves 94% A+B, matching Opus at 98%. Smaller models with EEM match larger models without it.
Expert Prompt Paradox
Telling an agent it is an expert reduces belief utilization. Beliefs alone outperform beliefs + expert prompt: Opus 100% vs 94.2%, Sonnet 94.2% vs 91.8%. The humble generic prompt produces better results because the agent consults the knowledge base instead of trusting its "expertise."
Self-Critique Failure
LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error. EEM externalizes the critic's judgments, replacing internal self-assessment with external structured tracking.
Architecture
Dual-Path Retrieval
EEM is queried via dual-path retrieval: TMS path (pre-computed beliefs) + FTS path (source chunk search), merged by a third pass. Each path stays within cognitive budget.
Cognitive Budget
Borrowed from graphics frame budgets: decompose work into focused passes (TMS pass, RAG pass, merge pass) each within the model's attention budget. Mixing beliefs and document chunks in a single prompt degrades performance (Opus drops 95.5% to 86%). Three focused passes achieve 100%.
Expert Pipeline
Chunk source material, propose beliefs, human accepts, derive connections, review derivations, export. Value accrues at each stage. Derive produces new knowledge — connections the source doesn't make explicit.
Multi-Agent TMS
Import another agent's beliefs with SL justifications including agent:active as antecedent. A node is IN iff the agent is active AND the original belief is justified. Doyle-style truth maintenance across agents.
Model Stacking
Model A generates candidates, TMS records with provenance, review critiques (machine + human), Model B receives validated beliefs, derives new beliefs, review critiques derivations, repeat. Each level is a full model pass with fresh context and a critique pipeline as quality gate.
For AI Agents
LLM agents use EEM by:
- Querying beliefs via
reasons search/reasons show/reasons explainbefore answering - Citing node IDs for auditability
- Running
reasons deriveto generate new beliefs from existing ones - Running
reasons review-beliefsto self-audit - Recording nogoods with
reasons nogoodwhen contradictions appear
The agent does not need to be told it is an expert — the knowledge base speaks for itself.
Two CLIs
beliefs- Structured markdown KB with provenance and manual maintenance. Simple, flat. Use for independent facts.
reasons- Full TMS with automatic propagation, cascades, backtracking, and LLM-driven operations. Use for justified conclusions with dependency chains.
Architecture Pattern
Use the reasons database for all structural operations (add, retract, derive, review). Export to beliefs.md for querying (fast, human-readable, grep-able). Keep both in sync via reasons export-markdown.
Getting Started
reasons init— createsreasons.db- Add premises from observations:
reasons add node-id "observation text" - Add justified conclusions with
--slto link dependencies:reasons add conclusion "derived text" --sl premise-a,premise-b - Use
reasons deriveto find connections the source doesn't make explicit - Use
reasons review-beliefsto audit — expect 13-37% retraction rate - Retract when evidence changes:
reasons retract node-id— cascades propagate automatically
Construction cost is O(chunks) + O(beliefs x rounds), but it amortizes across all queries O(queries). Expensive to build, cheap to query at scale.
Theoretical Foundations
- Doyle (1979) — Truth Maintenance Systems with SL justifications, propagation, retraction cascades, and an exogenous problem-solver slot.
- de Kleer (1986) — ATMS uses assumption-based environments and nogoods. TMS beats ATMS for EEM because revision matters more than multiple environments when the problem solver (LLM) produces 13-37% errors.
- AGM (Alchourrón, Gärdenfors, Makinson 1985) — formal theory for rational belief revision. Entrenchment scoring in backtracking is a crude approximation of AGM.
- McCarthy & Hayes (1969) — frame problem: what persists across state changes. Staleness checking addresses this by detecting when source files change under beliefs.