Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an architecture in which a language model first retrieves relevant passages from a corpus and then conditions its generation on those passages. The retrieval step grounds the answer in source material, reducing hallucination and enabling citation.

A RAG system has two halves: a retriever (often a vector index of embeddings over your documents) and a generator (the language model). The retriever fetches the top-k most relevant chunks for a query; the generator writes an answer using those chunks as context.

For research workflows, the value of RAG is not "the model knows my paper", it is "the model can cite the exact passage it used." The strongest RAG implementations expose those passages to the user as inline citations, so you can verify each claim.

Common failure modes: retrieving from too few chunks (misses cross-paper synthesis), retrieving from too many (drowns the generator in context), or retrieving without re-ranking (top embeddings are not always the most relevant passages).

Related reading