Insight

What's In a Semantic Starter Kit

How pre-built data models compress a year of work into weeks

Most companies need roughly the same first year of data projects. Product revenue mix. Customer churn. Cohort analysis. Segmentation. The specific numbers differ, but the patterns are universal. So why does every data team build them from scratch?

The first-year problem

When a data team stands up a new analytics program, whether that's a startup hiring its first analyst or an enterprise migrating to a new platform, the first year follows a predictable pattern. Month one: connect to the source systems. Month two: figure out what "revenue" means (it's never as simple as it sounds). Months three through eight: build the core business models. Months nine through twelve: build dashboards, fix the models, and start the next set.

This timeline isn't unusual. It's the norm. And most of that work is reinventing models that thousands of data teams have built before, just with different column names.

What a starter kit actually contains

A semantic starter kit is a pre-built data architecture designed for a specific industry or business function. It's not a template or a diagram. It's a working system that a data team can deploy, connect to their source systems, and start using.

Here's what's inside:

1. Medallion architecture (Bronze, Silver, Gold)

The kit includes a complete three-layer data model. The bronze layer captures raw data from source systems. The silver layer cleans, conforms, and joins that data using the business's vocabulary. The gold layer aggregates and shapes data for specific use cases like dashboards, reports, and machine learning features.

Each layer has a clear contract: what comes in, what goes out, and what transformations happen in between. This isn't just a schema. It's a design philosophy that separates concerns and makes the system maintainable as it grows.

2. Transformation logic

Between each layer sits transformation code (SQL, Python, or platform-native logic depending on your stack). The starter kit includes both physical transformations (materialized tables) and virtual transformations (views) so teams can choose the right trade-off between query performance and freshness.

This logic encodes the business rules: how "revenue" gets calculated, what counts as "active," when a customer is considered "churned." These definitions are the most valuable part of any data model, and they're the part most teams spend the longest building.

3. Observability and testing

Every data pipeline breaks. The question is whether you find out before or after someone makes a decision based on bad data. The starter kit includes test scripts that validate primary keys, foreign key relationships, business rule invariants, and data freshness. It also includes observability queries for monitoring pipeline health over time.

4. Dashboard templates

Reference visualizations that work out of the box with the gold layer. These aren't production dashboards. They're starting points that show the data team what's possible and give business users something concrete to react to. Iteration is faster when you start with something to critique rather than a blank canvas.

5. Data catalog metadata

Full documentation of every table, column, and business rule, with lineage tracing each metric back to its source. This metadata integrates with data catalog tools and ensures the kit is self-documenting. When someone asks "where does this number come from?" the answer is built into the system.

6. LLM-optimized model

A version of the data model specifically designed for natural language queries. Column names, table descriptions, and relationship metadata are structured so that an LLM can generate accurate SQL from business questions. This is the layer that makes "show me revenue by region for Q3" actually work.

Why not just build from scratch?

You can. And sometimes you should, especially if your business model is truly novel or your data sources are unusual. But for the 80% of data projects that follow established patterns, building from scratch means spending months recreating work that's been done thousands of times.

The starter kit doesn't eliminate custom work. It eliminates the commodity work so your team can focus on what's actually unique about your business. Instead of spending six months building a standard revenue model, you spend a month adapting one, and the remaining five months solving problems nobody else has solved yet.

Technology agnostic, opinion included

Our starter kits work across data platforms, though platforms with broader feature sets (Databricks, Microsoft Fabric, Snowflake) are the easiest to deploy on. The architecture patterns are platform-agnostic. The implementation details adapt to your stack.

We're opinionated about architecture but flexible about tooling. If you already have a platform, we'll work with it. If you're choosing one, we'll help you choose based on your specific situation rather than vendor marketing.

Getting started

If your team is about to start a new analytics program, or if you're six months in and feeling like everything is taking too long, a starter kit can compress your timeline significantly. Purchase one outright, subscribe for ongoing updates, or hire us to manage the implementation.

Discuss Starter Kits Data Products Service