Build an AI Knowledge Stack in an Afternoon
A step-by-step build: Supabase + pgvector + an MCP server + a basic ingestion loop. Runnable in under three hours, runnable forever for under ten dollars a month.
What You'll Have When You're Done
The final architecture is a production-ready semantic memory system. By following this guide, the end state is a Supabase-backed knowledge store utilizing pgvector for high-dimensional embedding storage, connected to an AI agent via the Model Context Protocol (MCP). This allows tools like Claude Desktop or Cursor to query private documentation in real-time.
The stack consists of four primary components: a PostgreSQL database with vector capabilities, a Python-based ingestion pipeline for processing Markdown files, PDFs, and chat logs, an MCP server acting as the bridge, and a LLM interface. Implementation typically requires under three hours of active development.
Operational costs remain low. With Supabase's free tier or basic plan and OpenAI's text-embedding-3-small model, total monthly expenditure ranges between $8 and $12, depending on the volume of documents processed. This provides a scalable foundation to build an AI knowledge stack without significant infrastructure overhead.
Step 1: Supabase and Schema
Initialization begins with a Supabase project. The core requirement is the pgvector extension, which transforms PostgreSQL into a vector database capable of performing cosine similarity searches on embeddings.
To build an AI knowledge stack, the schema must support both the raw content for LLM context and the vector representation for retrieval. Using a UUID primary key ensures compatibility across distributed systems, while JSONB allows for flexible metadata filtering (e.g., filtering by date or author) without altering the table structure.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE knowledge_entries (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(1536), -- Optimized for OpenAI text-embedding-3-small
source_uri TEXT NOT NULL,
source_type TEXT,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
-- IVFFlat index for faster retrieval on medium datasets
CREATE INDEX ON knowledge_entries
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
The IVFFlat index is selected here for initial setup due to its speed in building. For datasets exceeding 100,000 rows, transitioning to an HNSW index is recommended to maintain high recall rates during semantic search.
Step 2: Ingestion Pipeline
The ingestion pipeline converts unstructured data into searchable vectors. The process involves reading source files, splitting text into manageable chunks to avoid LLM context window overflow, and generating embeddings via an API provider like OpenAI or a local Nomic instance.
To prevent duplicate entries, the script implements an upsert logic based on a hash of the content combined with the source_uri. For production environments, replacing basic paragraph splitting with LangChain's RecursiveCharacterTextSplitter is advised to maintain semantic coherence across chunks.
import hashlib
from supabase import create_client
from openai import OpenAI
client = OpenAI()
supabase = create_client("URL", "KEY")
def get_embedding(text):
return client.embeddings.create(input=text, model="text-embedding-3-small").data[0].embedding
def process_file(path, source_type="markdown"):
with open(path, 'r') as f: content = f.read()
# Simple paragraph splitting
chunks = [c.strip() for c in content.split('\n\n') if c.strip()]
for chunk in chunks:
content_hash = hashlib.sha256(chunk.encode()).hexdigest()
vector = get_embedding(chunk)
# Upsert based on unique content hash and source
supabase.table("knowledge_entries").upsert({
"content": chunk,
"embedding": vector,
"source_uri": path,
"source_type": source_type,
"metadata": {"hash": content_hash}
}).execute()
process_file("./docs/architecture.md")
This pipeline allows users to build an AI knowledge stack by simply dropping files into a directory and running the script, ensuring the vector store remains synchronized with the source documentation.
Step 3: MCP Server
The Model Context Protocol (MCP) server acts as the interface between the LLM and the database. It exposes specific tools that the AI can call to retrieve relevant context or update its own knowledge base without manual SQL intervention.
Below is a conceptual implementation of an MCP server using the Python SDK, exposing two primary tools: search_knowledge for RAG retrieval and add_knowledge for real-time updates.
from mcp.server.fastmcp import FastMCP
import openai
mcp = FastMCP("KnowledgeStack")
@mcp.tool()
async def search_knowledge(query: str) -> str:
"""Search the knowledge base for relevant technical context."""
vector = openai.embeddings.create(input=query, model="text-embedding-3-small").data[0].embedding
# Call Supabase RPC for cosine similarity search
res = supabase.rpc('match_documents', {'query_embedding': vector, 'match_threshold': 0.5}).execute()
return "\n---\n".join([item['content'] for item in res.data])
@mcp.tool()
async def add_knowledge(text: str, source: str) -> str:
"""Add a new piece of information to the knowledge base."""
vector = openai.embeddings.create(input=text, model="text-embedding-3-small").data[0].embedding
supabase.table("knowledge_entries").insert({"content": text, "embedding": vector, "source_uri": source}).execute()
return "Knowledge stored successfully."
To integrate this with Claude Desktop, add the following configuration to claude_desktop_config.json:
{
"mcpServers": {
"knowledge-stack": {
"command": "python",
"args": ["/path/to/mcp_server.py"]
}
}
}
Step 4: Operate It
Ongoing operations for an AI knowledge stack focus on retrieval quality and index maintenance. As the dataset grows, monitoring the distance threshold in similarity searches is critical to prevent "hallucinations" caused by retrieving irrelevant chunks.
| Dataset Size | Recommended Index | Primary Benefit |
|---|---|---|
| < 10k rows | None (Exact Search) | Perfect Recall |
| 10k - 500k rows | IVFFlat | Fast Build Time |
| > 500k rows | HNSW | High Query Speed/Recall |
When migrating to HNSW, be aware that index build times are longer and memory consumption is higher. For those seeking a maintained, opinionated version of this architecture rather than a manual build, the NovCog Brain at novcog.dev provides a streamlined implementation of these patterns.