Skip to main content

Embedding and Knowledge Graph Generation

Once data is prepared (ingested, cleaned, enriched and chunked), the next layer focuses on creating structured knowledge representations. It completely depends on whether you're going to use a traditional, basic, naive RAG approach with vectors or a more complex GraphRAG approach.

This section covers two complementary approaches:

  • Embedding Creation: Generating vector embeddings for each data chunk using an embedding model. Embeddings are another term for vectors - numerical representations of text that enable semantic similarity search. They are the cornerstone of most RAG systems and how RAG started.

  • Knowledge Graph Creation: Building a graph of entities (nodes) and their relationships (edges) from the ingested data. A knowledge graph captures facts and how things are connected (for example, a graph might link a “Customer” node to an “Order” node via a “placed” relationship).

Both approaches can be used together to combine the strengths of both. Embeddings excel at capturing fuzzy semantic similarities (for example finding text related to “price increase” even if worded differently) whilst on the other side a knowledge graph excels at representing explicit, exact relationships and enabling precise queries (for example all customers in San Francisco who bought product X).

As a starting point with RAG, you'll at least want to create embeddings for vector search to play with. You'll use it for non-critical use cases and applications where precision is not as important. However, once you have a mission critical use case with high precision need, you'll need to create a knowledge graph that will be able to deliver.