Top 5 RAG
01 · Beginner → Intermediate

Hybrid RAG

Dense vectors meet sparse keywords.

When to use

When queries need both semantic understanding AND exact keyword/code/proper-noun matches (SKU, error codes, function names, version numbers, acronyms) — the DEFAULT upgrade when vector-only RAG keeps missing results that contain the exact term. ❌ Skip it if the corpus is small and queries are all natural language: plain vector search is enough.

Real-world examples

  • Internal docs / company wiki search: understand the question AND match exact file names, process codes.
  • Tech support: look up by exact error code ("ERR_2043") plus a natural-language symptom description.
  • E-commerce: find products by SKU/model code as well as natural description ("waterproof jacket").
  • Code/API docs lookup: match exact function/variable names while still grasping the described purpose.

Diagram

Illustrative pipeline diagram; see the step-by-step description in the Pipeline flow section below.

Pipeline flow

  1. 1Query
  2. 2Dense branch: Embedding Model → Vector DB → Dense Results
  3. 3Sparse branch: BM25 Index → Sparse Results
  4. 4Reciprocal Rank Fusion (RRF) merges both lists
  5. 5Top-K Chunks → LLM → Answer

In plain words

Like searching a library with TWO librarians at once: one understands what you MEAN (semantics), the other has memorized the exact shelf codes (precise keywords). Each hands you their list, then you merge — a book ranked high by BOTH is almost certainly the right one.

Concept A–Z

Vanilla RAG uses only dense vector embeddings — great at MEANING ("car" ≈ "automobile") but it misses when a user types an exact error code, function name, or SKU that embeddings blur away. Hybrid RAG runs TWO retrievers in parallel: (1) dense — semantic vector distance; (2) sparse — BM25, the classic keyword-ranking algorithm (Elasticsearch-style) that matches exact tokens. It then merges the two ranked lists with Reciprocal Rank Fusion. The result covers both "gist" and "exact text" → far higher recall than vector-only, at almost no extra complexity.

How it works

Two retrieval branches

The same query enters two independent branches, then they merge.

  • Dense: query → embedding → ANN search (Approximate Nearest Neighbor — finds the closest vectors approximately but fast; via cosine/dot) in a Vector DB (pgvector, Qdrant, Pinecone…). Captures meaning, synonyms, paraphrases.
  • Sparse: BM25 over an inverted index (a reverse lookup "term → docs containing it"). Captures exact rare/specific terms; the rarer the token the higher its score (IDF — inverse document frequency).
  • Each branch returns its own top-N (e.g. N=50) — NOT yet cut to final K.

Merge with Reciprocal Rank Fusion (RRF)

The two branches score on DIFFERENT scales (cosine 0–1 vs unbounded BM25) → you cannot add them directly. RRF uses only RANK: score(d) = Σ 1/(k + rank_i(d)), with k≈60. Docs ranked high in both branches rise to the top; no score normalization needed, very robust.

Rerank + cut Top-K

After RRF, add a rerank step with a cross-encoder (cohere-rerank, bge-reranker) — a model that reads the (query, doc) PAIR in one pass, so it judges TRUE relevance more accurately than embeddings (at the cost of speed) — then take Top-K (3–8) into the prompt. Reranking lifts precision noticeably at small cost.

In-depth content of the 5 RAG architectures

Unlock the hands-on code, pro tips, security notes, real-project guidance, common pitfalls and glossary — for the Senior plan and above.

Requires sign-in + the Senior plan or above

Already have an eligible plan? Sign in to unlock right away.

Related architectures

Practice AI/RAG interviews

Thousands of IT interview questions + roadmaps — learn fast, get hired.

Start practicing