Is Your Data Platform Agent-Ready? Most Aren't. Here's Why

May 15
5 min read

Robotic agents refusing inappropriate data

If your data platform were a restaurant, the last decade has been spent perfecting the buffet. Big trays of pre-cooked data, laid out at set times, ready for hungry humans to wander past with a plate and a dashboard. It works beautifully for the lunch rush of Monday morning exec reviews.

Now imagine the next wave of diners aren't humans at all. They're autonomous agents that show up at 3am, want a bespoke seven-course tasting menu, need to know exactly which farm the carrots came from, and will make decisions on your behalf based on what they're served. The buffet model breaks. Fast.

That's where most enterprise data platforms are right now. Built for batch consumption by humans, being asked to feed contextual reasoning by agents. And the cracks are already showing. How do we move towards an agent-ready data platform?

The dashboard era was built for a different consumer

For the last fifteen years, we've optimised our stacks around a fairly predictable pattern: ingest, transform, model, serve to a BI tool, refresh overnight. Even the shiniest lakehouse architectures are fundamentally shaped by this rhythm. Bronze, silver, gold. Curated marts. Semantic models that live inside Power BI or Tableau because that's where the human eyeballs were.

Agents don't work like that. An agent doesn't open a dashboard. It asks questions, chains decisions together, and acts. It needs to understand what a "customer" means in your business, not just query a table called dim_customer. It needs to know whether the data it's looking at is fresh enough to trust, whether it's allowed to use it, and what happened upstream to produce it.

Treating agents as just another consumer of your existing pipelines is the most common mistake I'm seeing right now. They aren't. They're a fundamentally different class of workload, and the bottleneck isn't compute or storage. It's context.

The four context layers agents actually need

When we audit a platform for agent readiness, we look for four layers of context. Most enterprises have one, maybe two. Very few have all four.

1. Semantic context

Agents need to know what things mean. Not just the column name, but the business definition, the synonyms, the relationships, the edge cases. "Revenue" in one part of your business is gross, in another it's net of returns. A human analyst knows this. An agent doesn't, unless you've made it explicit at the platform layer, not buried in a BI tool.

2. Lineage context

Where did this number come from? What pipelines produced it? When was it last refreshed? If an agent is making a pricing decision or flagging a supply chain risk, it needs to reason about the trustworthiness of its inputs. Lineage stops being a compliance nice-to-have and becomes a runtime dependency.

3. Governance context

Can this agent, acting on behalf of this user, in this context, access this data? Row-level security and column masking were designed around predictable human personas. Agents create dynamic, composable access patterns that most governance models can't handle. If your governance lives in spreadsheets or static role mappings, you have a problem.

4. Behavioural context

How is the data actually being used? Which queries are common, which are anomalous, which agents are looping or hallucinating? Behavioural telemetry on agent activity is the new observability frontier, and almost no one is doing it well yet.

Agent ready data stack — An Agent Ready Data Stack

Practical retrofit patterns mid-migration

The good news: if you're mid-migration to a lakehouse like Databricks, you're in a better position than you think. You don't need to rip up your roadmap. You need to add context as a first-class workstream alongside it.

A few patterns that have worked on the platforms we've delivered:

Promote Unity Catalog from a governance tool to a semantic layer. Most teams use it for access control and call it done. Use the tags, comments, and certified asset features to encode business meaning. Make it the source of truth for definitions, not a side effect of them.
Treat your semantic model as code. Pull definitions out of BI tools and into version-controlled, declarative artefacts that agents can query directly. Tools like dbt's semantic layer or Databricks' metric views are getting genuinely useful here.
Wire lineage into runtime, not just documentation. Lineage that only shows up in a catalogue UI is dead lineage. Agents need to query it programmatically to assess data freshness and provenance before acting.
Build an agent gateway. Don't let every agent connect directly to your warehouse. Route through a controlled layer that handles authentication, governance, prompt logging, and rate limiting. You'll thank yourself in month six.

A 90-day readiness assessment

If you're a platform leader trying to work out how exposed you are, here's the framework we use with clients. Three phases, 30 days each.

Days 1-30: Inventory and honest assessment. Where do your semantic definitions actually live today? Map them. How much of your lineage is captured automatically vs. tribal knowledge? What does your governance model look like when the consumer isn't a named human? Pick three real or hypothetical agent use cases and trace what would break.

Days 31-60: Close the highest-risk gaps. Usually this is semantic consolidation and governance modernisation. Get your top 20 business metrics defined once, in one place, accessible via API. Move your access control out of static roles into attribute-based policies that can flex around agent identity.

Days 61-90: Pilot with a real agent workload. Don't theorise. Pick a contained, valuable use case. Customer service triage, supply chain anomaly detection, finance close support. Run it end to end. Measure not just whether it works, but where it generates incorrect context, where governance breaks down, where lineage is missing. That's your real roadmap.

The leaders who'll win the next 18 months

The CDOs and platform leaders who'll come out of this period ahead aren't the ones with the biggest GenAI budgets. They're the ones who recognise that data context is now a platform capability, not a metadata afterthought. The ones treating semantic, lineage, governance, and behavioural layers as foundational infrastructure, the same way we treated compute and storage a decade ago.

This isn't theoretical work. It's delivery work. And it's the kind of thing we've spent the last few years embedding into client teams mid-migration, working alongside their architects and engineers rather than handing over a deck and walking away. If your modern data stack is being asked to do something it wasn't designed for, the answer isn't another strategy review. It's getting your hands on the platform and making it ready.

The organisations that move first on context infrastructure will be the ones whose agents can actually be trusted.