Most SEO audits die the day they’re delivered.
Someone spends six weeks crawling the site, pulling keyword reports, documenting broken links and missing meta descriptions. It all lands in a 30-page deck. The team reads it once, actions a few quick wins, and the rest quietly gathers dust.
By the time anyone revisits the recommendations, the site has changed, competitors have moved, and the audit is already stale.
This is the core problem with traditional SEO delivery. It treats search optimisation like a project when it’s actually a system — one that needs to run continuously, process new data as it arrives, and surface priorities that evolve with your site, your competitors, and the market.
Data engineering solved this problem years ago. The pattern is called a medallion architecture.
Table of contents
Open Table of contents
What is a medallion architecture?
Medallion architecture is a data design pattern popularised by Databricks in the late 2010s as part of their data lakehouse vision. It organises data into three progressive layers, each improving the structure and quality of the data as it moves through:
- Bronze — raw, unprocessed data. Everything lands here exactly as it arrived. Append-only, never modified. This is your source of truth.
- Silver — cleaned, validated, and enriched. Duplicates removed, metadata added, data structured and tagged. Ready to be useful.
- Gold — aggregated and analysis-ready. Business views that draw from silver and bronze for specific purposes. The layer that drives decisions.
The naming borrows from Olympic medals — progressively increasing value and trust. The underlying idea (layered data refinement) has existed since at least the late 1990s, but the bronze/silver/gold framework made it intuitive enough to become the default standard across most modern data teams.
As Johanan Ottensooser of Fiveonefour puts it, AI agents can “unlock the true benefits of medallion architectures, by automating the valuable but too often neglected work that makes medallion architectures really shine: enforcing standards, checking quality, generating docs — the hard, repetitive work that builds trust.”
The important principle: bronze is immutable. You never modify the raw data. Silver and gold are regenerable — if the enrichment logic changes or an agent miscategorises something, you simply reprocess from bronze. No data is ever lost.
Why this matters for SEO and GEO
Traditional SEO workflows don’t have layers. Everything lives in the same place — raw crawl exports, keyword spreadsheets, analytics screenshots, and written recommendations all jumbled together in a shared drive or a slide deck. There’s no separation between the data, the analysis, and the decisions.
This creates three problems:
- No source of truth. When the data and the interpretation live in the same document, you can’t go back and reprocess. If someone misreads a ranking report, the error propagates into every recommendation downstream.
- No repeatability. Every audit starts from scratch because there’s no structured data layer to build on. The same crawl gets re-run, the same keyword gaps get re-identified, the same broken links get re-discovered.
- No currency. A document is a snapshot. The moment it’s finished, it starts decaying. Sites change daily, competitors publish new content weekly, and Google’s algorithms shift constantly.
Agentic SEO — AI agents monitoring, analysing, and recommending continuously instead of periodic manual audits — falls apart without a data architecture underneath. Without one, the agents are just expensive ChatGPT wrappers that hallucinate recommendations based on whatever context you happened to paste in.
The three layers applied to agentic SEO
The medallion pattern maps cleanly to a modern SEO and GEO workflow.
Bronze: raw, untouched data
This is your read-only data layer. Everything your tools produce lands here in its original form:
- Screaming Frog crawl exports — every URL, status code, redirect chain, canonical tag, and page title
- Serpstat or SE Ranking data — keyword rankings, competitor visibility scores, keyword gap analysis, backlink profiles
- Google Search Console — impressions, clicks, click-through rates, average positions, and index coverage reports
- GA4 — traffic, engagement, conversion events, and user journey data
- Google Core Web Vitals — performance scores, loading times, layout shift data
None of this gets modified. It’s append-only — each crawl, each ranking pull, each analytics export adds to the bronze layer rather than overwriting what came before. This means you build a historical record. You can answer “what changed?” not just “what’s broken?”
The bronze layer is also where you store raw page content — the actual HTML and copy from your site. This matters because content quality agents need the original text to evaluate, not a summary someone wrote about it.
Silver: enriched and structured
This is where AI agents earn their keep. The silver layer takes raw bronze data and adds structure, metadata, and classification.
For example, a content quality agent reads every page from the bronze layer and adds:
- A content quality score based on copywriting best practices
- Brand voice compliance flags (does this page sound like your brand?)
- Search intent classification (informational, navigational, commercial, transactional)
- Cannibalisation flags (which pages compete for the same terms?)
- Thin content warnings (pages below a depth or quality threshold)
An SEO agent does the same with crawl data — scoring each page’s technical health, mapping internal linking gaps and orphaned pages, identifying redirect chains and soft-404s, and recommending indexation strategy (index, noindex, consolidate). A competitor agent processes ranking and visibility data to score keyword opportunities, flag content gaps where competitors rank and you don’t, and classify ranking trends as rising, stable, or declining.
The critical point: silver runs through everything and tags it systematically. When you have 10,000+ pages on a domain, no human team can manually assess every page against every criterion. Agents can. And they can re-run the enrichment whenever the bronze data updates, keeping the silver layer current.
Gold: prioritised and actionable
Gold is where decisions happen. It draws from both silver and bronze to assemble specific views for specific purposes:
- A prioritised task list ranked by business impact — “this broken link is on your highest-traffic landing page and costs you an estimated X clicks per month,” not just “fix broken link #47”
- Weekly or monthly automated reports — new technical issues, competitor movements, content quality flags, and opportunities surfaced proactively
- On-demand answers — “why did organic traffic drop last week?” The gold layer pulls the relevant silver metadata and bronze raw data to give you a real answer, not a guess
Gold is ephemeral. It’s a view, assembled on demand. You can tear it down and rebuild it differently for a different audience or a different question. The CMO gets a different gold view than the content team. Both are built from the same silver and bronze layers underneath.
This is also where Generative Engine Optimisation (GEO) comes in. As search shifts from ranking blue links to being cited in AI-generated answers, the gold layer needs a new question: “Will AI engines reference this content?” The silver layer scores for structured data, citation-worthiness, and semantic clarity. Gold then surfaces pages invisible to AI search alongside those invisible to traditional search.
Why the layers matter for LLM performance
There’s a technical reason this architecture matters beyond just good data hygiene: it keeps your AI agents in what HumanLayer calls the “smart zone.”
Every LLM has a context window — the amount of information it can hold and reason about at once. Research from Stanford (“Lost in the Middle”) found that LLMs have a U-shaped attention curve: they attend well to the beginning and end of context, but the middle is a dead zone. Accuracy can drop from 87% to 54% simply from context overload.
Without a medallion architecture, you’re dumping raw bronze data straight into the AI’s context window. Entire crawl exports. Full keyword reports. Thousands of rows of analytics data. The agent drowns in noise and produces vague, generic recommendations — or worse, confidently wrong ones.
With the architecture in place:
- Agents processing bronze → silver work on focused, scoped tasks. “Score this page against these criteria.” The context stays small and specific.
- Agents assembling gold pull from pre-enriched silver data rather than raw exports. They get structured metadata, not thousands of raw rows.
- Each agent operates in its own context, isolated from the others. The SEO agent doesn’t need to see the copywriting scores. The competitor agent doesn’t need the full crawl data.
This is the same principle behind context engineering — the practice of designing the information environment around an AI model so it can perform at its best. The medallion architecture is context engineering applied to data.
A practical example
I’m currently building this architecture for an enterprise client with well over 10,000 pages on Kentico, going through a platform migration, with multiple product lines and content published by different teams across the organisation. Here’s what the difference looks like in practice.
Without the architecture:
Someone runs a Screaming Frog crawl, exports to Excel, spends two weeks reviewing the results, writes a deck, presents it, and hopes someone acts on it. Three months later, half the recommendations are irrelevant because the migration broke things the deck didn’t anticipate.
With the architecture:
- Bronze: Screaming Frog runs on a schedule, appending each crawl. GA4 and Search Console data flows in daily. Serpstat pulls competitor rankings weekly. All raw, all preserved.
- Silver: Agents process each new crawl automatically. They score technical issues by severity, evaluate content against brand guidelines and search intent, and classify competitor keyword movements as threats or opportunities.
- Gold: Every Monday, the marketing team gets a report: “Here are the 10 highest-impact tasks this week, ranked by business impact.” When the CMS migration breaks a redirect chain on Thursday, the agent catches it the same day — not three months later in the next audit cycle.
The audit never goes stale because it never stops running.
Getting started
You don’t need to build the full architecture on day one. Start with the layers you can implement now and expand:
Week 1-2: Establish bronze
Get your data sources flowing reliably. Set up scheduled Screaming Frog crawls. Connect Google Search Console and GA4 exports. Pick a keyword tool (Serpstat, SE Ranking, Ahrefs) and set up regular data pulls. The goal is raw data arriving consistently, stored in its original format.
Week 3-4: Build your first silver agent
Start with the highest-value enrichment. For most sites, that’s a technical SEO agent that processes crawl data and flags issues by severity. This alone replaces the manual audit cycle and gives you a living technical health score.
Week 5-6: Add content and competitor silver
Build agents that score content quality and track competitor movements. These take longer because they need configuring — your brand guidelines, your positioning, your competitive set. But once they’re running, they’re running forever.
Week 7+: Assemble gold
With silver data flowing, start building the views that matter. A weekly prioritised task list. A monthly report for leadership. An on-demand Q&A interface where the team can ask questions and get data-backed answers.
The bottom line
The traditional SEO audit model — big project, long document, hand it over, start again in six months — doesn’t work in a world where sites, competitors, and search engines change constantly. And with AI-generated search results adding a whole new optimisation surface (GEO), the volume of signals to monitor has only increased.
The medallion architecture gives you a framework for building SEO that actually compounds: bronze preserves the raw truth, silver adds intelligence, and gold delivers decisions. Layer AI agents on top and you get something that runs continuously, catches problems as they emerge, and prioritises based on evidence rather than gut feel.
It’s not a 30-page deck. It’s a system. And unlike a deck, it’s still working six months after you build it.
Further reading
- What is a Medallion Architecture? (Databricks)
- Medallion Architectures and AI Agents (Fiveonefour)
- Context Engineering: The AI Skill Marketers Actually Need (Growth Method)
- Master MCP: A Clear Introduction for Marketers (Growth Method)
- From SEO to GEO: Preparing for the Agentic Web
- Agentic SEO: From Keywords to Continuous Discoverability (Siteimprove)