Insight

Data Modeling Should Be Readable

The typical approach to data modeling has three layers: Conceptual, Logical, and Physical. Conceptual expresses data in purely business terms. Logical shows the same business layer in a more formal entity-attribute model. Physical maps it to a particular technology. In practice, most teams skip the first two and start with DDL. That's the root of the problem.

The readability deficit

I spent 13 years leading data organizations, most recently as Group SVP of Global Data Platforms at The Adecco Group. Across every engagement, the same thing happened: a data engineer would build a schema, a business stakeholder would ask what it meant, and the answer required a 45-minute meeting. The schema was correct. It was also illegible to the people who needed to trust it.

This isn't a tooling complaint. It's a structural one. SQL DDL was designed to instruct a database engine, not to communicate meaning. YAML-based semantic layers improved on this by separating metric definitions from physical storage, but they introduced their own syntax, another language for humans to parse. The result is that most data models exist as artifacts that only their authors can fully interpret.

When I began studying semantic architecture, it became clear that the most useful thing to communicate in a data model was not the physicality (not the column types or join conditions) but the relationships between business concepts. And if relationships are the point, they should be expressed in a way that doesn't require a data engineering background to read.

What readability actually means

A readable data model isn't a diagram. Diagrams are pictures of models, not the models themselves, and they drift from reality the moment someone edits a migration script. A readable model is one where the source of truth is written in language that a product manager, a finance director, or an LLM can parse without translation.

That's the design principle behind Eloquent, an open-source semantic data modeling language I built to solve this problem. The entire syntax is natural language relationships. Here's what a basic model looks like:

- Order has a Customer
- Order has a Price
- Order has an Order Date
- Customer has a Customer Name
- Customer Name is a String
- Price is a Number
- Order Date is a Date

Seven lines. No curly braces, no indentation rules, no special characters beyond the dash. A business analyst can read this and know immediately what it describes. A data engineer can read it and see the entity-attribute structure. An LLM can read it and generate accurate SQL against it.

The entire language has three relationship types: has a for composition, is a (between entities) for inheritance, and is a [Type] for physical typing. That's it. Three relationships describe the conceptual, logical, and physical layers simultaneously, in one model, in plain language.

Why one model matters

The traditional three-layer approach creates a maintenance problem. The conceptual model lives in a slide deck. The logical model lives in an ERD tool. The physical model lives in migration scripts. They diverge immediately and stay diverged forever. Nobody reconciles them because the effort isn't worth it, until an auditor asks or an LLM needs context, and then suddenly it's very worth it and very expensive.

Eloquent collapses these layers into a single source file. From that file, it generates 12+ downstream artifacts: physical schemas, logical views, a knowledge graph, NL-to-SQL translation, SQL tests, data catalog documentation, ERD diagrams, observability scripts, a metrics matrix, RAG context for LLMs, NL-to-SQL training pairs, and dbt-compatible transformations. The model is the documentation. There's no drift because there's no second artifact to drift from.

Here's a slightly richer example with inheritance:

- Customer has a Customer Name
- Customer has a Customer Email
- Customer Name is a String
- Customer Email is a String
- Enterprise Customer is a Customer
- Enterprise Customer has a Contract Value
- Contract Value is a Number
- Order has a Customer
- Order has a Price
- Order has an Order Date
- Price is a Number
- Order Date is a Date

Enterprise Customer inherits everything from Customer (Customer Name, Customer Email) without repeating it. When Eloquent traverses the graph, it knows that an Order's Customer might be an Enterprise Customer with a Contract Value. The physical schema, the views, the tests, and the LLM context all reflect this automatically. You describe the relationship once, and the tooling propagates it everywhere.

The views layer insight

One of the things that drove me toward this approach was an observation about views. Views are partly physical; they exist within a database and are indistinguishable from tables in most query contexts. But they're also partly logical, because they really only model relationships and encapsulate business logic. They sit at the intersection of the conceptual and the physical.

My engineering teams used to spend too much time arguing about whether changing data types counted as EL or T, or what qualified as a bronze, silver, or gold table. Medallion architecture and ETL/ELT patterns are not very useful when the disagreements they generate consume more engineering time than the problems they solve. The real question was never "which layer does this belong to?" It was "what does this data mean, and how does it relate to other data?"

That realization, that the meaningful unit of data architecture is the relationship and not the layer, is what Eloquent encodes. When you write Order has a Customer, you're not specifying a join condition or a foreign key. You're stating a business fact. The join condition, the foreign key, the view definition, the test assertion, and the LLM prompt context are all derived from that single statement.

LLM-ready by construction

Most teams retrofitting their data platforms for LLM integration face a metadata problem. The LLM needs to understand what tables exist, what columns mean, how they relate, and what business questions they can answer. This context usually has to be manually assembled from scattered documentation, tribal knowledge, and reverse-engineered SQL.

When your data model is already written in natural language, this problem disappears. The model itself is the context. Eloquent generates RAG-ready representations and NL-to-SQL training pairs directly from the model file. There's no translation step because the model was never written in a language that needed translating.

This is what I mean by "LLM-ready from day one." It's not a feature bolted onto an existing modeling paradigm. It's a consequence of the design decision to write models in natural language in the first place.

Where this is going

Eloquent is open source, implemented in Python, and available under BSL 1.1. The core language is intentionally small (three relationship types) because the power comes from graph traversal over those relationships, not from syntax complexity. It's technology-agnostic, generating artifacts for PostgreSQL, MySQL, Databricks, Snowflake, Microsoft Fabric, and others.

But the broader point isn't about one tool. It's about the principle: data models should be readable by the people who depend on them. If your model requires a specialist to interpret, it's not serving its full purpose. If your documentation is separate from your model, it's already wrong. And if your LLM integration requires a manual metadata assembly step, you're building on a foundation that doesn't scale.

Readable models aren't a nicety. They're an architectural decision that compounds: in documentation quality, in stakeholder trust, in LLM accuracy, and in the speed at which a data team can move.

Learn About Eloquent Start a Conversation