Retrieval Augmented Generation, Part 1: Build a Solid Baseline

Notes on RAG (Part 1): Build a Solid Baseline

February 20, 2026

The reason Retrieval Augmented Generation (RAG) exists is simple. Most real documents are too long to throw into one prompt. Even when they fit, performance often drops because the model has to scan too much irrelevant text before it can answer a focused question.

Retrieval augmented generation changes that dynamic. Instead of asking the model to read everything, we first retrieve a small set of relevant passages. Then we ask for an answer grounded in that context. The model does less searching and more reasoning.

Why a Baseline Matters

Many teams jump straight to advanced tricks and lose track of what is helping. A baseline gives you a clean reference. If quality is poor, you can diagnose one layer at a time rather than tuning everything at once.

A practical baseline has four parts. Chunk the source text, represent chunks and queries in the same semantic space, retrieve the closest chunks, and answer with those chunks as context.

Chunking Is an Editorial Choice

Chunking is not just preprocessing. It is content design. If chunks are too short, meaning gets cut apart. If chunks are too long, retrieval gets noisy and expensive.

Size based chunking with overlap is a strong first step because it works on almost any document. Later you can move to structure based chunking when your documents have reliable headings or sections.

Embeddings Are About Relative Meaning

Embeddings turn text into vectors so similar meaning ends up near similar meaning. This lets a question about incident handling find a passage about postmortems even if the exact words differ.

The key rule is consistency. Use the same embedding model for both stored chunks and incoming queries. If you mix models, distance in vector space stops being trustworthy.

Retrieval First, Generation Second

When a RAG answer is wrong, the first question should be about retrieval. Did we retrieve the right evidence. If not, prompt edits on the generation side will not fix the root cause.

In a healthy baseline, the retrieved chunks are visibly relevant before you ever read the final answer. That one habit saves a lot of debugging time.

Common Early Failure Patterns

At this stage you usually see three issues.

Relevant section exists but is split across chunks in a way that loses local meaning.
Question uses exact terms and semantic retrieval misses them.
Answer sounds fluent but is not clearly grounded in retrieved text.

These are normal baseline failures. The goal of Part 1 is not perfection. The goal is a reliable system you can improve with confidence.

What to Watch Before Moving On

Before adding complexity, check three things. Retrieval relevance should look good by inspection. The answer should cite or mirror the retrieved context. The system should admit uncertainty when evidence is missing.

The takeaway from Part 1 is straightforward. Strong RAG starts with retrieval discipline. Once this is stable, advanced improvements actually pay off.

Continue to Part 2: Hybrid Retrieval, Reranking, and Evaluation

← Back to Blog