Every AI agent that persists across interactions needs memory. The question is not whether agents should remember things -- it is what form that memory takes, what guarantees it provides, and what fails when the memory mechanism is wrong for the task. The answer depends on the kind of work the agent is doing. An agent helping a user draft an email has different memory requirements than an agent approving financial transactions or managing customer account states.
The field has converged on a small number of memory mechanisms, but the differences between them are poorly understood. Teams default to whatever their framework provides -- typically conversation history with optional RAG retrieval -- without analyzing whether the mechanism matches the accountability requirements of their use case. When the use case demands auditability, provenance, or deterministic recall, and the memory mechanism cannot provide those properties, the result is not a performance degradation. It is a governance failure.
This article builds a taxonomy of four agent memory mechanisms, compares them across five governance dimensions, and provides a decision guide for selecting the right mechanism for the right task. It concludes with a worked example that demonstrates the practical difference between RAG retrieval and structured typed facts for a customer-data lookup in a regulated context.
The Four Memory Mechanisms
Agent memory is not a single capability. It is a design space with at least four distinct mechanisms, each with different properties, different failure modes, and different governance characteristics. Understanding the taxonomy is a precondition for making sound architectural decisions.
1. Conversation History
Conversation history is the simplest form of agent memory: an ordered sequence of message turns (user messages, assistant responses, tool calls, tool results) maintained within or across sessions. Most agent frameworks provide this as the default memory mechanism. LangChain's memory abstractions, for example, include conversation buffer memory, conversation summary memory, and windowed conversation memory -- all variants of turn-sequence storage.
The properties of conversation history are: ordered (turns have a temporal sequence), lossy (older turns are dropped or summarized when the context window fills), unstructured (the content of each turn is natural language, not typed data), and session-scoped (history typically resets between sessions unless explicitly persisted). Conversation history answers the question "what has been said?" It does not reliably answer "what is currently true?" because it conflates statements, corrections, and outdated information in the same linear sequence.
2. Vector Embeddings
Vector embedding memory stores information as high-dimensional numeric vectors that capture semantic similarity. When an agent needs to recall something, the query is embedded into the same vector space and the nearest neighbors are retrieved. Pinecone's guide to vector search describes the mechanics in detail: embedding models convert text to vectors, vector databases index those vectors, and cosine similarity or dot product scores determine retrieval.
The properties of vector embeddings are: semantically flexible (similar concepts cluster together even if worded differently), opaque (the embedding is a numeric vector -- a human cannot inspect it and determine what fact it represents), provenance-weak (the original source text is stored alongside the vector, but the relationship between the retrieved text and the fact the agent needs is mediated by similarity score, not by explicit structure), and query-dependent (different queries retrieve different subsets, with no guarantee that the most relevant fact will rank highest).
3. RAG Context Windows
Retrieval-Augmented Generation extends the agent's effective memory by retrieving relevant document chunks at query time and injecting them into the context window. RAG is often built on vector embeddings but is architecturally distinct: the retrieval step selects chunks, the augmentation step formats them into the prompt, and the generation step produces an output conditioned on those chunks.
The properties of RAG are: query-dependent (what the agent "remembers" changes with every query), chunk-bounded (the unit of retrieval is a document chunk, not a discrete fact), context-window-constrained (retrieved chunks compete with instructions and conversation history for limited context space), and non-deterministic at the retrieval boundary (small changes in query phrasing can produce different retrieval sets, leading to different agent outputs even when the underlying knowledge is unchanged).
4. Structured Typed Facts (ATOMs)
Structured typed facts -- referred to as ATOMs in the decision infrastructure vocabulary -- are explicit key-typed-value assertions with provenance metadata. An ATOM has a name (customer.credit_tier), a type (enum("prime", "near_prime", "subprime")), a value ("prime"), a provenance record (where the fact came from, when it was asserted, what system asserted it), a validity window (when the fact is considered current), and an access policy (who can read it, who can update it).
The properties of ATOMs are: typed (values conform to declared schemas), queryable by name (retrieval is deterministic -- ask for customer.credit_tier and get exactly that fact), auditable (every ATOM carries provenance), immutable once asserted (updates create new versions, previous values are preserved), and governed (access policies control who can read and write each fact). The cognitive architectures survey identifies the need for persistent, queryable, type-constrained knowledge stores as a prerequisite for agents operating in accountable contexts.
What Goes Wrong with Unstructured Memory in Accountable Contexts
When an agent relies solely on conversation history, vector embeddings, or RAG for decisions that require accountability, five failure modes emerge. These are not edge cases. They are structural consequences of using the wrong memory mechanism for the task.
Fact contradiction without resolution. Conversation history can contain contradictory statements. A user may say "my address is 123 Main Street" in turn 4 and "actually, my address is 456 Oak Avenue" in turn 19. The history contains both. The agent may use either, depending on context window positioning and attention patterns. A structured typed fact has one current value for customer.address; the previous value is archived with a timestamp, not competing for attention.
Semantic drift in retrieval. Vector embedding retrieval returns results ranked by similarity, not by correctness or recency. A query about "customer credit status" might retrieve a chunk from six months ago that describes a different credit assessment methodology, because the semantic content is similar even though the factual content is outdated. The agent has no mechanism to distinguish "semantically similar" from "currently true."
Missing provenance. When an agent makes a consequential decision based on a fact retrieved from a RAG system, the audit question is: "Where did that fact come from, who asserted it, and when?" RAG retrieval provides a document chunk and a similarity score. It does not provide a structured provenance record. Reconstructing the provenance after the fact requires tracing back through document ingestion pipelines, which may or may not be possible.
Non-deterministic recall. Ask the same RAG system the same question twice with slightly different phrasing, and you may get different retrieved chunks. This means two evaluations of the same underlying decision can use different facts, producing different outcomes. For any decision that must be reproducible -- regulatory, financial, legal -- non-deterministic recall is a disqualifying property.
No access governance. Conversation history and vector stores typically have uniform access: the agent can read everything in the store. When different facts have different sensitivity levels -- PII, financial data, health information -- uniform access means the agent has access to everything regardless of whether the current task requires it. Typed facts with per-fact access policies enable the principle of least privilege at the knowledge layer, not just at the API layer.
The ATOM Model: Structure for Governed Memory
The ATOM model formalizes the requirements for memory that can be governed. Each ATOM carries four properties beyond its name-type-value triple. These properties are what differentiate a governed memory fact from a stored string.
Type schema. Every ATOM conforms to a declared schema that constrains its value to a specific type: enum, integer, decimal, string, boolean, timestamp, or a composite type. The schema is not advisory -- it is enforced at write time. An attempt to assert customer.credit_tier = 47 against an enum schema fails, because 47 is not a valid enum value. This prevents the class of errors where garbage data enters the knowledge store and propagates to decisions.
Provenance record. Each ATOM records its source: which system or process asserted it, when, based on what upstream data. The provenance is not free-text metadata; it is a structured record with defined fields. This enables the audit trail to extend from the decision back through the facts to the original data sources, closing the traceability gap that unstructured memory creates.
Validity window. An ATOM's validity window defines the time period during which the fact is considered current. A credit score ATOM might have a validity window of 30 days; after that, the fact is automatically treated as stale and must be refreshed before it can be used in a decision. Validity windows prevent the silent staleness problem where agents make decisions based on facts that are no longer current but are still sitting in the memory store.
Access policy. Each ATOM has an associated access policy defining which agents, roles, or systems can read it and which can write it. A financial ATOM like customer.annual_income might be readable by underwriting agents and writable only by verified data ingestion pipelines. This implements least-privilege at the fact level, as recommended by the NIST AI Risk Management Framework's data governance principles.
ATOMs vs. Embeddings: Five Governance Dimensions
The following comparison evaluates structured typed facts (ATOMs) against vector embeddings across five governance dimensions that matter for accountable agent systems. The comparison is not a judgment that one is universally better -- it is an analysis of which properties each mechanism provides and which it lacks.
| Governance Dimension | Vector Embeddings | Structured Typed Facts (ATOMs) |
|---|---|---|
| Deterministic Recall | No. Retrieval depends on query embedding and similarity threshold. Different queries can retrieve different facts. | Yes. Query by name returns the exact fact. Same query always returns the same current value. |
| Provenance | Weak. Source document is stored, but the path from document chunk to asserted fact is not structured. | Strong. Each fact carries a structured provenance record: source system, assertion timestamp, upstream data reference. |
| Auditability | Difficult. Reconstructing what the agent "knew" at decision time requires replaying the embedding query and hoping the same chunks are retrieved. | Direct. The ATOM value at decision time is recorded in the decision trace. No replay needed. |
| Access Control | Coarse. Typically all-or-nothing access to the vector store. Namespace-level segmentation is possible but not per-fact. | Fine-grained. Per-fact access policies enable least-privilege at the knowledge layer. |
| Staleness Detection | None by default. Embeddings have no built-in expiration. A fact embedded two years ago is retrieved with the same weight as one embedded yesterday. | Built-in. Validity windows enforce temporal currency. Stale facts are flagged or excluded from decision evaluation automatically. |
The pattern that emerges is clear: vector embeddings excel at semantic flexibility and approximate retrieval over large, unstructured corpora. Typed facts excel at deterministic precision, provenance, and governance over the specific facts that consequential decisions depend on. The mechanisms serve different purposes. The error is using one where the other is required.
Decision Guide: Which Mechanism for Which Task
Not every agent task requires the same memory mechanism. The following guide maps task characteristics to the appropriate memory type. In practice, production agents often use multiple mechanisms simultaneously -- the question is which mechanism serves as the authoritative source for which class of information.
| Task Characteristic | Recommended Memory Mechanism | Rationale |
|---|---|---|
| Casual information retrieval, research assistance, content generation | RAG + Vector Embeddings | Semantic flexibility matters more than deterministic precision. Approximate recall is acceptable. |
| Multi-turn conversation continuity within a session | Conversation History | Turn ordering and dialogue context are the primary requirements. Formal provenance is not needed. |
| Decision-making over customer or account data (financial, medical, legal) | Structured Typed Facts (ATOMs) | Deterministic recall, provenance, auditability, and access control are non-negotiable for accountable decisions. |
| Compliance-sensitive operations where the audit question is "what did the agent know?" | Structured Typed Facts (ATOMs) | The audit trail must trace from decision to fact to source. Only typed facts with provenance records enable this. |
| Knowledge base search across large document corpora | RAG + Vector Embeddings | The volume and variety of source material makes embedding-based retrieval the pragmatic choice. |
| Hybrid: research phase followed by decision phase | RAG for research, ATOMs for decision inputs | Use RAG to gather candidate information; validate and promote relevant facts to ATOMs before the decision point. |
The hybrid pattern in the last row deserves emphasis. Many production workflows involve a research or information-gathering phase where RAG is appropriate, followed by a decision phase where structured facts are required. The architectural pattern is: use RAG to identify relevant information, validate that information against authoritative sources, assert the validated facts as typed ATOMs, and then evaluate decisions against those ATOMs. This pattern preserves the flexibility of RAG for discovery while ensuring the governance properties of typed facts for consequential decisions.
Worked Example: Customer-Data Lookup with RAG vs. Typed ATOM
Consider a customer service agent that needs to determine whether a customer is eligible for a credit limit increase. The agent needs to know the customer's current credit tier, their account tenure, and their payment history classification. Compare how this plays out with RAG retrieval versus typed ATOM lookup.
RAG Retrieval Path
The agent issues a query: "What is the credit profile for customer 12345?" The RAG system embeds this query, searches the vector store, and retrieves the three highest-scoring chunks. The first chunk is from a customer onboarding document that mentions "Customer 12345 was classified as near-prime at account opening." The second chunk is from a credit review note: "Following 12-month review, customer 12345 reclassified to prime tier." The third chunk is from an unrelated support ticket that mentions customer 12345's name in a billing context.
The agent now has three chunks with contradictory credit tier information (near-prime vs. prime), no explicit indication of which is current, and an irrelevant third result. The agent must infer which fact is authoritative -- typically by reading dates from the text content and reasoning about recency. If the agent infers correctly, it uses "prime." If it infers incorrectly, or if the document text does not include clear dates, it may use "near-prime." The decision trace, if one exists, would record "the agent retrieved these chunks and produced this output" -- but it cannot record which specific fact value was treated as authoritative, because no such designation exists in the retrieval results.
Typed ATOM Lookup Path
The agent requests: GET customer.credit_tier WHERE customer_id = 12345. The ATOM store returns:
{
"name": "customer.credit_tier",
"type": "enum(prime, near_prime, subprime)",
"value": "prime",
"asserted_at": "2026-03-15T10:00:00Z",
"asserted_by": "credit-review-pipeline-v3",
"provenance": "annual_credit_review_batch_2026Q1",
"valid_until": "2027-03-15T10:00:00Z"
}The agent receives one value, with explicit type, provenance, assertion timestamp, and validity window. There is no ambiguity about which value is current. The decision trace records the exact ATOM evaluated: name, type, value, provenance, and validity. If an auditor later asks "what credit tier did the agent use for customer 12345 in this decision?" the answer is a single record, not a reconstruction from retrieved chunks.
The Difference in Practice
The RAG path is adequate for informational queries where approximate answers are acceptable. The ATOM path is necessary for governed decisions where the audit question is "what specific fact value did the system use, where did it come from, and was it current at the time of the decision?" The distinction is not about retrieval quality -- modern RAG systems are impressively capable. It is about the structural properties that governed decision-making requires: determinism, provenance, validity, and auditability.
Architecture Implications: Memory as a Governance Layer
The taxonomy above reveals that agent memory is not a single engineering problem. It is a governance problem: the choice of memory mechanism determines what the agent can prove about what it knew and when it knew it. Teams building agents for accountable domains need to treat memory architecture as a governance decision, not a framework default.
The practical steps are: audit your current agent memory mechanisms and classify each one using the taxonomy above. Identify which agent decisions are consequential -- meaning they affect customer accounts, financial outcomes, compliance obligations, or access rights. For each consequential decision, verify that the memory mechanism providing the decision inputs has the governance properties required: deterministic recall, provenance, validity windows, and access control. Where the mechanism falls short, design a migration path that promotes critical facts from unstructured stores to typed, governed facts.
The MemGPT research demonstrated that even in conversational contexts, hierarchical memory management with explicit promotion and demotion of facts between memory tiers produces more reliable agent behavior than flat context window management. The same principle applies at the governance level: facts that drive consequential decisions should be promoted from unstructured memory to structured, typed, governed memory stores -- not left in conversation buffers or vector databases where their provenance, validity, and access properties cannot be guaranteed.
For deeper exploration of the vocabulary used in this analysis, see ATOMs, EMUs, and the Decision Plane. For the mental model connecting typed facts to decision rules, see The ATOMs and EMUs Mental Model.
