External Epistemic Memory

Justified, persistent, auditable knowledge for LLMs

What is EEM?

External Epistemic Memory (EEM) is knowledge that lives outside the model, carries its justifications with it, and lets you understand how the system knows what it knows.

EEM is defined by three load-bearing properties: it is external (outside model parameters), epistemic (justified beliefs with truth values), and memory (persistent semantic knowledge). Each property is necessary. Together they distinguish EEM from every other approach to LLM knowledge management.

Three Properties

External

Knowledge lives outside model parameters, in a separate substrate. It survives compaction, model swaps, and session boundaries. It is separable, copyable, shareable, inspectable, editable, and auditable. Of these six properties, auditability is the most epistemically important — it makes "how do you know that?" answerable by justification chain traversal.

Epistemic

Not just facts but justified beliefs with truth values (IN/OUT), retraction cascades, contradiction records (nogoods), and derivation depth. This is what distinguishes EEM from RAG, which is external semantic memory but not epistemic.

Memory

Persistent structured knowledge in Tulving's semantic memory category — not ephemeral context. The knowledge persists across sessions, across models, and across time.

What EEM Replaces

EEM vs RAG

RAG is external semantic memory but not epistemic. It retrieves content by similarity but has no justification chains, truth values, retraction cascades, or contradiction tracking. EEM adds the epistemic layer that RAG lacks.

EEM vs Context Windows

Conversation history and context windows are ephemeral — lost at session boundaries, destroyed by compaction. EEM persists across sessions and model swaps. Context compaction destroys justification networks (quantified across 33 measured compaction events).

EEM vs Parametric Knowledge

In-parameter knowledge has no audit trail. EEM makes "how do you know that?" answerable by justification chain traversal.

EEM vs Self-Assessed Confidence

LLM self-assessed confidence does not track accuracy. Confirmed across 4 models: Sonnet r=0.198, Opus r=-0.182 (worse than random), Flash r=0.219, Pro r=0.121. Answer and confidence come from the same process — the same structural flaw as human overconfidence (Kahneman). EEM replaces "am I sure?" with "is this justified?" — shifting from unreliable confidence to auditable justification chains.

How It Works

Truth Maintenance System (TMS)

EEM is built on Doyle's (1979) Truth Maintenance System architecture: SL justifications with antecedents, propagation cascades, retraction cascades, and an exogenous problem-solver slot. The TMS substrate is content-agnostic by design.

Hybrid Architecture

The implementation is a hybrid TMS: symbolic TMS handles structure (justifications, propagation, cascades, backtracking, challenge/defend) while LLMs handle semantic operations (derive generates beliefs, review-beliefs critiques them, contradiction detection finds nogoods). Putting an LLM in the TMS problem-solver slot is what Doyle's architecture prescribes.

Key Mechanisms

Derive-then-Review

Over-derive, then review catches errors, retraction cascades propagate corrections. Both roles overshoot: derive over-generates, review over-retracts. Working through candidate retractions is where insights hide. 13-37% of derived beliefs are retracted per review round — the system finds and removes its own errors.

Measured Results

98.5%
A/B grade across 3,853 questions with dual-path architecture. Zero D/F grades — eliminated the failure tail entirely.
88% vs 33%
Expert-service with EEM scores 88% A-grade vs baseline 33% on same 50 questions, 15x faster.
40+
Expert knowledge bases built, from 237 beliefs (aap-expert) to 12,731 beliefs (redhat-expert).

Model Compensation

EEM compensates for model size: Sonnet+beliefs approximates Opus without beliefs. Haiku with dual-path achieves 94% A+B, matching Opus at 98%. Smaller models with EEM match larger models without it.

Expert Prompt Paradox

Telling an agent it is an expert reduces belief utilization. Beliefs alone outperform beliefs + expert prompt: Opus 100% vs 94.2%, Sonnet 94.2% vs 91.8%. The humble generic prompt produces better results because the agent consults the knowledge base instead of trusting its "expertise."

Self-Critique Failure

LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error. EEM externalizes the critic's judgments, replacing internal self-assessment with external structured tracking.

Architecture

Dual-Path Retrieval

EEM is queried via dual-path retrieval: TMS path (pre-computed beliefs) + FTS path (source chunk search), merged by a third pass. Each path stays within cognitive budget.

Cognitive Budget

Borrowed from graphics frame budgets: decompose work into focused passes (TMS pass, RAG pass, merge pass) each within the model's attention budget. Mixing beliefs and document chunks in a single prompt degrades performance (Opus drops 95.5% to 86%). Three focused passes achieve 100%.

Expert Pipeline

Chunk source material, propose beliefs, human accepts, derive connections, review derivations, export. Value accrues at each stage. Derive produces new knowledge — connections the source doesn't make explicit.

Multi-Agent TMS

Import another agent's beliefs with SL justifications including agent:active as antecedent. A node is IN iff the agent is active AND the original belief is justified. Doyle-style truth maintenance across agents.

Model Stacking

Model A generates candidates, TMS records with provenance, review critiques (machine + human), Model B receives validated beliefs, derives new beliefs, review critiques derivations, repeat. Each level is a full model pass with fresh context and a critique pipeline as quality gate.

For AI Agents

LLM agents use EEM by:

The agent does not need to be told it is an expert — the knowledge base speaks for itself.

Two CLIs

beliefs
Structured markdown KB with provenance and manual maintenance. Simple, flat. Use for independent facts.
reasons
Full TMS with automatic propagation, cascades, backtracking, and LLM-driven operations. Use for justified conclusions with dependency chains.

Architecture Pattern

Use the reasons database for all structural operations (add, retract, derive, review). Export to beliefs.md for querying (fast, human-readable, grep-able). Keep both in sync via reasons export-markdown.

Getting Started

  1. reasons init — creates reasons.db
  2. Add premises from observations: reasons add node-id "observation text"
  3. Add justified conclusions with --sl to link dependencies: reasons add conclusion "derived text" --sl premise-a,premise-b
  4. Use reasons derive to find connections the source doesn't make explicit
  5. Use reasons review-beliefs to audit — expect 13-37% retraction rate
  6. Retract when evidence changes: reasons retract node-id — cascades propagate automatically

Construction cost is O(chunks) + O(beliefs x rounds), but it amortizes across all queries O(queries). Expensive to build, cheap to query at scale.

Theoretical Foundations