RAG vs agent memory: what's the difference?
RAG retrieves static documents. Agent memory stores evolving knowledge. Here's where they overlap, where they diverge, and when to use each.
RAG vs agent memory
Retrieval-Augmented Generation (RAG) and agent memory both solve the same core problem: LLMs forget. Context windows are finite, and once a conversation exceeds them, knowledge is lost.
But they solve it in fundamentally different ways. Understanding the distinction matters because choosing the wrong approach — or using only one when you need both — leads to agents that either can't access external knowledge or can't remember what they've learned.
What RAG does
RAG is a pattern where you retrieve relevant documents from an external store and inject them into the LLM's context before generating a response.
The typical RAG pipeline:
- Ingest: Split documents into chunks, generate embeddings, store in a vector database.
- Retrieve: When the user asks a question, embed the query and find the most similar chunks.
- Generate: Pass the retrieved chunks plus the user's question to the LLM. The model generates a response grounded in the retrieved content.
RAG works well for knowledge bases, documentation, and any scenario where you need the model to answer questions about a corpus of text it wasn't trained on.
The key property of RAG: the knowledge source is external and static. Documents are ingested, indexed, and retrieved. The RAG system doesn't create new knowledge — it surfaces existing knowledge at the right time.
What agent memory does
Agent memory stores the agent's own evolving knowledge — facts it has learned, decisions it has made, preferences users have expressed, and context from past conversations.
Unlike RAG, agent memory is:
- Read-write: The agent creates, updates, and retires facts during normal operation.
- Scoped: Facts are organized by lifetime and visibility — user preferences persist forever, task scratch notes expire when the task completes.
- Self-updating: When facts change, the old version is superseded. The agent always works with current information.
The key property of agent memory: the knowledge source is internal and dynamic. The agent builds its own knowledge base through interaction.
Where they overlap
Both systems:
- Store information outside the LLM's context window
- Use embeddings and semantic search for retrieval
- Inject retrieved content into the prompt before generation
- Reduce hallucination by grounding responses in real data
If you squint, agent memory looks like RAG where the agent is both the author and the consumer of the documents.
Where they diverge
Updates
RAG is essentially read-only from the LLM's perspective. Documents are ingested by a separate pipeline. When source material changes, you re-index.
Agent memory is read-write in the hot path. The agent writes facts during conversation, and those facts are immediately available for retrieval in the same session or future sessions. There's no separate ingestion pipeline — writing is part of the agent's normal operation.
Scoping
RAG retrieves from a single pool. All chunks compete for relevance equally. There's no concept of "this chunk is only relevant in this user's context" or "this chunk expires after this session."
Agent memory scopes facts by lifetime and visibility:
| Scope | Visible to | Lifetime |
|---|---|---|
| Task | Current task only | Deleted when task completes |
| Session | Current session | Deleted when session ends |
| User | All sessions for this user | Permanent |
| Agent | All users, all sessions | Permanent |
This means the agent doesn't waste tokens on irrelevant facts. A user's preference isn't competing with another user's preference for retrieval slots.
Fact evolution
When a RAG source document is updated, you re-chunk, re-embed, and re-index. The old version is typically replaced entirely. There's no history, no audit trail, no concept of "this fact superseded that fact."
Agent memory tracks evolution explicitly. When the agent learns that a user switched from VS Code to Cursor, it doesn't delete the old fact — it marks it as superseded. The current fact wins in search, but the history is preserved. This matters for:
- Debugging: Why did the agent recommend X? Because at the time, fact Y was current.
- Audit: What did the agent know, and when did it know it?
- Rollback: If a fact was incorrectly superseded, restore it.
State management
RAG doesn't manage state. It retrieves documents and that's it. If your agent needs to checkpoint its progress, branch into parallel exploration paths, or recover from bad decisions, RAG can't help.
Agent memory systems (at least the ones that go beyond simple key-value storage) integrate state management. Checkpointing, branching, and recovery are part of the same system that handles memory.
When to use each
Use RAG when:
- You have a corpus of existing documents (docs, knowledge base, legal contracts, research papers)
- The knowledge is created by humans, not by the agent
- Updates are infrequent and batched
- You don't need per-user or per-session scoping
- The agent needs to answer questions about the documents, not about its own history
Use agent memory when:
- The agent needs to remember things it learned during conversation
- Users expect the agent to know their preferences, past decisions, and context
- Facts change frequently and the agent needs to handle contradictions
- Different contexts need different subsets of knowledge
- The agent needs to operate across multiple sessions without losing continuity
Use both when:
- The agent answers questions about external documents (RAG) and remembers user context (memory)
- The agent needs to ground responses in reference material while also tracking conversation history
- You're building a production agent that interacts with real users over time
Most production agents need both. The mistake is treating them as interchangeable.
How db0 handles this
db0 is an agent memory system, not a RAG framework. It handles the read-write, scoped, self-updating side of the equation.
But db0's context assembly layer can work alongside RAG. The context().pack() API assembles relevant memories into the token budget, leaving room for RAG-retrieved documents. You bring your own RAG pipeline — db0 handles the memory side.
The combination:
- RAG retrieves relevant documents from your knowledge base
- db0 retrieves relevant memories (user preferences, past decisions, session context)
- Both are packed into the context window, each allocated a portion of the token budget
- The LLM generates a response grounded in both external knowledge and agent memory
This is better than either alone. RAG without memory means the agent can't learn. Memory without RAG means the agent can't access external knowledge.
Key takeaways
- RAG and agent memory solve different problems. RAG surfaces external documents. Memory stores internal, evolving knowledge.
- They're complementary, not competing. Most production agents need both.
- The critical differences are updates, scoping, and fact evolution. If your knowledge changes frequently, is user-specific, or needs an audit trail, you need memory — not just RAG.
- Don't use RAG as a substitute for memory. Storing agent facts in a RAG pipeline means no scoping, no superseding, and no state management. It works until it doesn't.
Further reading
- What is AI agent memory? — The foundational guide to agent memory systems.
- Memory scopes for AI agents — Why flat memory breaks down and how scopes fix it.
- db0 vs Mem0 — How db0 compares to Mem0's approach to agent memory.