Top 5 RAG
03 · Advanced

Agentic RAG

Retrieval becomes a plan, not a step.

When to use

When a question needs multiple sources/steps ("this quarter’s revenue vs plan and why the gap?"), dynamic tool choice (Vector/Web/SQL), real-time data, or must TAKE actions (open a ticket, call an API) rather than just read. ❌ Don’t use it for one-shot Q&A: it adds latency, cost and failure points.

Real-world examples

  • Internal analytics (BI) assistant: ask "this quarter’s revenue vs plan?" → the agent runs SQL + reads explanatory docs.
  • Deep-research: synthesize a topic from web + internal docs across multiple lookup steps.
  • Cross-system tech support: query Jira + wiki + logs to reconstruct an incident’s root cause.
  • Action assistant: not just answer but open a ticket / call an API after human confirmation.

Diagram

Illustrative pipeline diagram; see the step-by-step description in the Pipeline flow section below.

Pipeline flow

  1. 1Query → Planner Agent
  2. 2Dynamically pick Tools: Vector Search · Web Search · SQL Database
  3. 3Reasoner Agent synthesizes
  4. 4Loop: agent loops until confident
  5. 5Final Answer

In plain words

Like handing a task to a capable research assistant instead of doing one Google lookup yourself. The assistant decides "which source to check first", opens each one, keeps digging when something’s missing, and only COMMITS to an answer once confident. The trade-off: high flexibility, but you must cap time/cost or the assistant "researches forever".

Concept A–Z

Classic RAG is a straight line: retrieve once → answer. Many real questions don’t fit that mold: they need multiple sources, sub-question decomposition, or re-querying when results fall short. Agentic RAG turns retrieval into a LOOP driven by an AGENT: a Planner Agent decides the next step (what to query, which tool to call — vector, web, SQL…), executes, observes the result, and LOOPS until a Reasoner Agent is confident enough to answer. It’s the ReAct/agent loop applied to retrieval: very flexible, but you must control cost, latency, and safety (the agent can take actions).

How it works

The Plan → Act → Observe loop

Unlike linear RAG, the agent decides each round based on what it has seen.

  • Plan: the LLM decides the next step (decompose the question, pick a tool, write a sub-query).
  • Act: call a tool (vector search / web search / SQL / API).
  • Observe: read results; if insufficient → loop with a new query; if enough → hand to the Reasoner to synthesize + cite.
  • Stop conditions (MANDATORY): max N rounds + token/cost budget + "confident enough".

Tools as "retrievers"

Each source is a tool with a clear schema; the agent picks tools by the question.

  • Vector Search: internal semantic knowledge.
  • SQL/DB: precise, aggregated numbers (revenue, inventory).
  • Web Search: fresh/out-of-corpus info (but must verify + sandbox).
  • Each tool should return sources so the Reasoner can cite.

In-depth content of the 5 RAG architectures

Unlock the hands-on code, pro tips, security notes, real-project guidance, common pitfalls and glossary — for the Senior plan and above.

Requires sign-in + the Senior plan or above

Already have an eligible plan? Sign in to unlock right away.

Related architectures

Practice AI/RAG interviews

Thousands of IT interview questions + roadmaps — learn fast, get hired.

Start practicing