Concepts

RAG vs agent memory: what's the difference?

RAG retrieves static documents. Agent memory stores evolving knowledge. Here's where they overlap, where they diverge, and when to use each.

RAG vs agent memory

Retrieval-Augmented Generation (RAG) and agent memory both solve the same core problem: LLMs forget. Context windows are finite, and once a conversation exceeds them, knowledge is lost.

But they solve it in fundamentally different ways. Understanding the distinction matters because choosing the wrong approach — or using only one when you need both — leads to agents that either can't access external knowledge or can't remember what they've learned.

What RAG does

RAG is a pattern where you retrieve relevant documents from an external store and inject them into the LLM's context before generating a response.

The typical RAG pipeline:

Ingest: Split documents into chunks, generate embeddings, store in a vector database.
Retrieve: When the user asks a question, embed the query and find the most similar chunks.
Generate: Pass the retrieved chunks plus the user's question to the LLM. The model generates a response grounded in the retrieved content.

RAG works well for knowledge bases, documentation, and any scenario where you need the model to answer questions about a corpus of text it wasn't trained on.

The key property of RAG: the knowledge source is external and static. Documents are ingested, indexed, and retrieved. The RAG system doesn't create new knowledge — it surfaces existing knowledge at the right time.

What agent memory does

Agent memory stores the agent's own evolving knowledge — facts it has learned, decisions it has made, preferences users have expressed, and context from past conversations.

Unlike RAG, agent memory is:

Read-write: The agent creates, updates, and retires facts during normal operation.
Scoped: Facts are organized by lifetime and visibility — user preferences persist forever, task scratch notes expire when the task completes.
Self-updating: When facts change, the old version is superseded. The agent always works with current information.

The key property of agent memory: the knowledge source is internal and dynamic. The agent builds its own knowledge base through interaction.

Where they overlap

Both systems:

Store information outside the LLM's context window
Use embeddings and semantic search for retrieval
Inject retrieved content into the prompt before generation
Reduce hallucination by grounding responses in real data

If you squint, agent memory looks like RAG where the agent is both the author and the consumer of the documents.

Where they diverge

Updates

RAG is essentially read-only from the LLM's perspective. Documents are ingested by a separate pipeline. When source material changes, you re-index.

Agent memory is read-write in the hot path. The agent writes facts during conversation, and those facts are immediately available for retrieval in the same session or future sessions. There's no separate ingestion pipeline — writing is part of the agent's normal operation.

Scoping

RAG retrieves from a single pool. All chunks compete for relevance equally. There's no concept of "this chunk is only relevant in this user's context" or "this chunk expires after this session."

Agent memory scopes facts by lifetime and visibility:

Scope	Visible to	Lifetime
Task	Current task only	Deleted when task completes
Session	Current session	Deleted when session ends
User	All sessions for this user	Permanent
Agent	All users, all sessions	Permanent

This means the agent doesn't waste tokens on irrelevant facts. A user's preference isn't competing with another user's preference for retrieval slots.

Fact evolution

When a RAG source document is updated, you re-chunk, re-embed, and re-index. The old version is typically replaced entirely. There's no history, no audit trail, no concept of "this fact superseded that fact."

Agent memory tracks evolution explicitly. When the agent learns that a user switched from VS Code to Cursor, it doesn't delete the old fact — it marks it as superseded. The current fact wins in search, but the history is preserved. This matters for:

Debugging: Why did the agent recommend X? Because at the time, fact Y was current.
Audit: What did the agent know, and when did it know it?
Rollback: If a fact was incorrectly superseded, restore it.

State management

RAG doesn't manage state. It retrieves documents and that's it. If your agent needs to checkpoint its progress, branch into parallel exploration paths, or recover from bad decisions, RAG can't help.

Agent memory systems (at least the ones that go beyond simple key-value storage) integrate state management. Checkpointing, branching, and recovery are part of the same system that handles memory.

When to use each

Use RAG when:

You have a corpus of existing documents (docs, knowledge base, legal contracts, research papers)
The knowledge is created by humans, not by the agent
Updates are infrequent and batched
You don't need per-user or per-session scoping
The agent needs to answer questions about the documents, not about its own history

Use agent memory when:

The agent needs to remember things it learned during conversation
Users expect the agent to know their preferences, past decisions, and context
Facts change frequently and the agent needs to handle contradictions
Different contexts need different subsets of knowledge
The agent needs to operate across multiple sessions without losing continuity

Use both when:

The agent answers questions about external documents (RAG) and remembers user context (memory)
The agent needs to ground responses in reference material while also tracking conversation history
You're building a production agent that interacts with real users over time

Most production agents need both. The mistake is treating them as interchangeable.

How db0 handles this

db0 is an agent memory system, not a RAG framework. It handles the read-write, scoped, self-updating side of the equation.

But db0's context assembly layer can work alongside RAG. The context().pack() API assembles relevant memories into the token budget, leaving room for RAG-retrieved documents. You bring your own RAG pipeline — db0 handles the memory side.

The combination:

RAG retrieves relevant documents from your knowledge base
db0 retrieves relevant memories (user preferences, past decisions, session context)
Both are packed into the context window, each allocated a portion of the token budget
The LLM generates a response grounded in both external knowledge and agent memory

This is better than either alone. RAG without memory means the agent can't learn. Memory without RAG means the agent can't access external knowledge.

Key takeaways

RAG and agent memory solve different problems. RAG surfaces external documents. Memory stores internal, evolving knowledge.
They're complementary, not competing. Most production agents need both.
The critical differences are updates, scoping, and fact evolution. If your knowledge changes frequently, is user-specific, or needs an audit trail, you need memory — not just RAG.
Don't use RAG as a substitute for memory. Storing agent facts in a RAG pipeline means no scoping, no superseding, and no state management. It works until it doesn't.