You Don't Need a Vector Database

By Montgomery Kuykendall

Published March 3, 2026 Updated March 3, 2026

How Schema.org structured data replaced an entire RAG pipeline with zero-cost graph traversal — eliminating vector databases, embedding models, and chunking heuristics while delivering higher accuracy for less than a penny per interaction.

#schema-org
#rag
#vector-database
#ai-architecture
#structured-data

How Schema.org Structured Data Replaced My Entire RAG Pipeline — and Why Nobody Else Thought to Try

The Part Where I Accidentally Solve a Billion-Dollar Problem

I didn't set out to challenge the retrieval-augmented generation industry. I was building a website.

My site — this one, the one you're on right now — is hand-coded. Pure HTML, CSS, JavaScript. No React. No Next.js. No frameworks. Every page is authored by hand, every section structured intentionally, every piece of content placed where I want it. That's not a flex. That's just how I build things. I want to know what the machine is doing at every layer because the moment I stop knowing, I've lost control of my own system.

As part of building the site, I wrote Schema.org structured data for every page. JSON-LD blocks embedded in the <head> tags. Standard practice if you care about SEO. Google uses it for rich results and Knowledge Graph integration. It tells search engines what your content is — not just what words appear on the page, but what type of thing it is, who created it, when it was published, what it relates to, how it fits into the larger structure of the site.

I wrote a lot of it. Not just the basics — not just an Organization entity and a WebSite entity like most sites have. I wrote structured data for every framework, every section of every framework, every creative work, every research project, every blog post, every professional service. 260 entities across 28 Schema.org types, connected by 311 typed relationships. Each HTML section has a corresponding structured data counterpart. Two parallel representations of the same content — one for humans, one for machines.

Then I built an AI assistant.

And I realized the structured data I'd already written for Google was the only knowledge base the assistant would ever need.

What RAG Actually Is (And Why It's Overcomplicated)

Let me walk through what the industry does right now when they want an AI to answer questions about their own content. This is the standard Retrieval-Augmented Generation pipeline:

First, you ingest your content — HTML pages, PDFs, documentation, whatever. Then you chunk it. You break the content into segments, usually by token count or sentence boundaries, with overlap windows so each chunk has some context from the chunks around it. Then you embed each chunk — you run it through an embedding model that converts text into a high-dimensional vector, a point in abstract mathematical space where "similar" content ends up near other "similar" content. Then you index those vectors in a vector database — Pinecone, Weaviate, Chroma, Qdrant, pick your poison. Then when a user asks a question, you embed their question too, find the nearest neighbor vectors in the database, rerank the results for relevance, and inject the top-K chunks into the LLM prompt as context.

That's seven steps. Seven places where information gets transformed, compressed, decontextualized, or lost. Seven layers of infrastructure you need to build, maintain, monitor, and pay for. And at the end of it, the model still hallucinates, because chunked text without explicit typing, relationships, or provenance is fundamentally ambiguous. The model is doing its best with fragments. Sometimes its best isn't good enough.

The entire pipeline exists because of one assumption: web content is unstructured and must be transformed into something machine-retrievable.

That assumption is wrong.

The Insight

Schema.org structured data, when properly authored, already provides everything a RAG system needs. I'm going to say that again because it's the core thesis of this entire post and nobody in the AI industry has noticed it yet.

The structured data you already wrote for Google is a complete, pre-chunked, semantically typed, relationship-rich, provenance-tracked knowledge base for AI retrieval.

Let me break down why.

Pre-chunked. Each JSON-LD entity is a self-contained semantic unit. A TechArticle. A ScholarlyArticle. A Person. A ResearchProject. A GraphFragment representing a specific section of a specific page. These aren't arbitrary text segments created by a chunking algorithm that doesn't understand your content. They're meaningful units you authored intentionally. No chunking heuristics needed.

Semantically typed. Every entity carries an explicit @type annotation. The system doesn't need to run an embedding model to figure out what something is. It's declared. A TechArticle is a technical article. A Game is a game. A SoftwareApplication is a software application. No inference required. No embedding similarity needed to understand what kind of thing you're looking at.

Relationship-rich. Schema.org predicates — hasPart, isPartOf, about, subjectOf, founder, mentions, knowsAbout — encode the relationships between entities explicitly. The propulsion framework isPartOf the framework series. The framework series hasPart nine frameworks. The site about a Person. That Person founder an Organization. These relationships are declared, not inferred. No vector proximity calculation needed to find related content. The relationships are right there.

Provenance-tracked. Every entity in the graph tracks its source file, publication date, modification date, script index, entity index, and graph version. Citations aren't something you bolt on after retrieval. They're structural. Every piece of information carries its own audit trail.

The retrieval layer that the AI industry spent billions building separately has existed on every properly built website since Google, Microsoft, Yahoo, and Yandex co-launched Schema.org in 2011.

The reason nobody noticed is that the SEO world and the AI world don't talk to each other. The people who understand Schema.org are content strategists. The people building RAG pipelines are ML engineers. Two communities sitting on opposite sides of the same solution, neither one looking across the aisle.

I looked across because I'm not in either camp. I wrote both layers on the same site. And I went: why would I build a second system when the first one already has everything the AI needs?

AIR: The Architecture

AIR — Artificial Intelligence Registrar — is the system I built on top of this insight. It's deployed on this site right now. If you've clicked any of the "Explain this section" buttons on my framework pages, you've used it. If you've opened the companion overlay and asked a question, you've talked to it.

Here's how it works.

Build Time: HTML → Knowledge Graph

At build time, a script extracts all JSON-LD from the site's HTML pages. It resolves cross-page @id references — an entity on one page referencing an entity on another page by its identifier. It materializes the result as a knowledge graph: 260 nodes, 311 edges, stored as two JSONL files. Nodes in one file. Edges in the other. A manifest tracks SHA-256 checksums for cache invalidation.

The dominant node type is GraphFragment — 187 of them. These represent per-section structured data entities that correspond one-to-one with HTML sections on each framework page. Every section you see on the page has a machine-readable counterpart in the graph. This is what makes the per-section buttons possible. When you click "Explain this section" on section 2.4 of a framework, the system doesn't chunk the page and hope it retrieves the right fragment. It loads the exact GraphFragment entity for that section, along with its graph neighborhood.

Query Time: Graph Traversal as Retrieval

When someone queries AIR, the system doesn't embed the question and search a vector index. It does graph traversal.

It loads the target entity. It finds all edges connected to that entity. It ranks the connected neighbors by a priority map over relationship predicates:

Priority	Predicate	Semantic Role
0 (highest)	`isPartOf`	Structural containment
1	`hasPart`	Composition
2	`about`	Topical relevance
3	`subjectOf`	Inverse topicality
4	`citation`	Referenced works
5	`mentions`	Loose references
6	`knowsAbout`	Knowledge domain
7 (lowest)	`relatedTo`	General relation

This priority map replaces embedding similarity and reranking. It's deterministic. It's interpretable. You can trace exactly why any given piece of context was selected — because it has a hasPart relationship with the target entity, because it's the about of the parent page, because it mentions the relevant framework. The "why" is always visible. There's no embedding space to debug, no cosine similarity threshold to tune, no reranking model to evaluate.

The system resolves the top-5 neighbors by priority, serializes the entity metadata, neighbor summaries, relationship labels, and full provenance chains into a prompt-ready payload, and hands it to the LLM. The model receives pre-structured, pre-typed, relationship-contextualized information with source tracking built in.

That's the entire retrieval pipeline. Load entity. Traverse edges. Rank by predicate priority. Serialize context. Call LLM.

No embedding step. No vector index. No similarity search. No reranking. No chunking. No vector database.

Why It Can't Hallucinate (Mostly)

AIR's anti-hallucination architecture isn't one mechanism. It's six layers of defense-in-depth.

Content grounding. The LLM receives typed entities with stable identifiers, declared properties, and tracked provenance. Not decontextualized text fragments. The model knows what type of thing it's looking at, where it came from, and when it was last updated.

Citation enforcement. AIR's persona configuration mandates a Sources block on every response. If the model omits it, the system automatically reprompts: "The previous draft omitted the required Sources block. Provide a corrected response." This is programmatic, not a suggestion in the system prompt that the model might ignore.

RAG policy enforcement. Response-level policies define minimum citation counts, maximum reprompt attempts, and specific reprompt instructions. The citation requirement is structural, not optional.

Structural provenance. Every prompt includes the entity's source file, script index, entity index, last modified date, and manifest version. The model can't cite something that doesn't exist in the graph because the provenance trail is part of the input context. There's no room to invent a source.

Low temperature. AIR generates at temperature 0.2. Creative sampling is minimized. Character and personality come from the persona system — the voice profile, the behavioral modes, the lexicon directives — not from the model rolling dice on word choice. The data stays grounded. The persona provides flavor.

Explicit prohibition. The persona manifest prohibits fabrication outright: "Making things up. If you don't know, say so with grace." and "False certainty. Better to acknowledge mystery than pretend omniscience."

Can AIR still hallucinate? Yes. In multi-turn conversations, context can drift away from the graph. When multiple entity neighbors are in the prompt, the model could blend properties across entities. These are real failure modes. But they're edge cases, not the default behavior. The default is cited, grounded, traceable responses.

The Cost Comparison That Ends the Conversation

Here's where it gets uncomfortable for the vector database industry.

A typical AIR interaction uses 1,500–3,200 input tokens and generates about 400 output tokens. At current LLM pricing, that's $0.004–$0.008 per interaction. Four tenths of a cent to eight tenths of a cent.

The monthly infrastructure cost is zero. The graph is two JSONL files on disk. There's no database to host, no embedding service to pay for, no ingestion pipeline to maintain.

A comparable traditional RAG system would cost $70–$500/month minimum for a managed vector database (Pinecone Starter starts at $70). Plus embedding costs. Plus reranking costs. Plus the engineering time to maintain the chunking strategy, monitor index drift, and re-embed when content changes. The LLM cost is the same either way — but AIR eliminates every other cost entirely.

And the traditional system still hallucinates, because chunks are decontextualized fragments and the model is interpolating meaning from proximity. AIR doesn't hallucinate because there's nothing to interpolate from. The entities are self-contained. The relationships are declared. The provenance is tracked. The model cites what it can see and can't invent what isn't there.

Higher accuracy. Lower cost. Zero infrastructure. Less than a penny per interaction.

For indie developers, small businesses, content creators, personal sites — the people who would benefit most from an AI assistant on their site — the traditional RAG pipeline is a non-starter. The infrastructure cost alone puts it out of reach. AIR makes it accessible to anyone who can write structured data.

The Triple-Consumer Architecture

Here's the part that I think is actually the most elegant, and it's the part that happened by accident.

The structured data on this site serves three consumers simultaneously:

Google's crawler reads it for search ranking. Rich results, Knowledge Graph integration, semantic understanding of page content.

The neural graph on the homepage reads it for interactive visualization. The same conceptual structure that AIR traverses is rendered as the 2.5D animated graph you see when you visit the site. The node labels, the edge connections, the framework types — they're derived from the same data.

AIR reads it for question answering. Graph traversal, context selection, cited responses.

One data source. Three outputs. Zero divergence.

When I update a framework page, I update the HTML and the corresponding structured data together. The graph rebuilds. Google re-indexes. The neural graph re-renders. AIR's knowledge updates. One write propagates to all three consumers. There's no synchronization problem. There's no content drift. There's no "the chatbot says something different than the website" because the chatbot and the website are reading from the same source of truth.

This is also why Google ranks my pages well. I didn't do SEO in the traditional sense — I didn't stuff keywords or build backlinks or hire a content strategist. I did data architecture. I structured the content correctly, declared the types and relationships explicitly, and gave Google a complete machine-readable ontology of my entire body of work. Google interpreted good architecture as authority. Because it is.

Thinking Like the Machine

The reason nobody else built this is that nobody else thought like the AI.

Everyone designing RAG systems is thinking like a human designing a system for a machine. They're projecting human information processing patterns onto a model that doesn't share them.

Humans need surrounding context to understand a sentence. So engineers build chunking with overlap windows to preserve "context." But the model doesn't need a paragraph before and after to understand a typed entity with declared relationships. The overlap padding is noise. It's context the model doesn't want, occupying tokens that could be used for actual retrieval.

Humans infer types from surrounding text. "This paragraph is probably about a research project because the previous paragraph mentioned research." So engineers use embedding similarity to group content by inferred type. But the model would rather just be told: @type: ResearchProject. Declaration beats inference. Every time.

Humans hold relationships in memory by reading narrative. "I remember from earlier in the document that this framework relates to that one." So engineers try to preserve those narrative connections through careful chunking and overlap. But the model would rather have an explicit edge: hasPart, isPartOf, about. A declared relationship is unambiguous. A narrative implication is not.

If you asked the model to design its own input format, it would design JSON-LD. Typed entities. Explicit relationships. Source tracking. Minimal token footprint. Maximum information density. Zero ambiguity.

That's what Schema.org already is. A format designed for machines to understand web content. The AI industry just forgot that LLMs are also machines.

What This Means

I want to be precise about the scope of this thesis. I'm not saying vector databases are bad. They're excellent for large-scale, multi-source, heterogeneous content retrieval. If you're building a system that needs to search across millions of documents from dozens of sources, you need embeddings and you need a vector database. The scale ceiling on a JSONL graph is real — 260 nodes works. 260,000 nodes would need a different architecture.

What I am saying is that for a massive class of deployments — single-site AI assistants, documentation chatbots, portfolio explainers, content-grounded knowledge systems — the retrieval layer already exists. It's been sitting in the <head> of every properly built website for fourteen years. Nobody used it because nobody thought to look.

The adoption path isn't "everyone hand-writes 260 Schema.org entities." The path is a semi-automated extraction pipeline — a build tool that scans well-structured HTML and proposes entities for review. WordPress with Yoast SEO already generates structured data. Webflow generates it. Squarespace generates it. A plugin that materializes existing Schema.org markup into a traversable knowledge graph would make this architecture accessible to millions of sites overnight.

The graph extraction pipeline is the product. Open-sourcing it is not future work. It's the highest-leverage next step.

The Part Where Strangers Validated It Without Being Asked

Last week, fifteen people from four continents found an unpromoted, unlinked framework page on this site in a thirty-minute window. Direct traffic. No referrer. Someone copied the URL into a private channel — Discord, Telegram, Signal, something — and fourteen other people clicked it from São Paulo, Madrid, Belgrade, Kuala Lumpur, and scattered US cities.

Three of them opened AIR. They talked to it. One asked it to surface key concepts from an 8,000-word speculative propulsion framework. AIR returned a structured briefing with bracket citations pointing to specific document sections, identified the theoretical foundations and core subsystems, and ended with a strategic recommendation: "Treat ZPSPS as a framework for frontier R&D portfolio design, not a near-term product."

That recommendation wasn't in the structured data. AIR inferred it from the architecture of the roadmap and the tone of the document. The structured data kept it grounded. The persona gave it judgment. The total cost of that interaction was fractions of a cent.

I didn't promote the page. I didn't share the link. I didn't know anyone was there until I checked analytics. A system I built mostly just to see if it could was serving strangers from four continents with cited, grounded, strategically sharp responses to questions I never anticipated. Without my knowledge. Without my involvement. For less than a penny.

That's what happens when you build the architecture right and then get out of the way.

The Open Question

What happens when every well-structured website can have a zero-hallucination AI assistant for the cost of a few API calls?

What happens when the retrieval layer is free because Google already made everyone build it?

What happens when the $2B vector database industry discovers that the problem they're solving was already solved by a web standard from 2011?

I don't know. But I built this mostly just to see if I could. And it works.

The AIR system is live on this site. Open the companion overlay and ask it something. Click "Explain this section" on any framework page. Watch the citations. Check the cost.

If you're an engineer and you want to see how it works under the hood, the architecture is documented. If you're a conference organizer and you want a 20-minute talk that makes your RAG panel uncomfortable, I'm in Boise.