- The Dashboard
- Posts
- Your $2,000/Month Analytics Tool Costs $20 in Infrastructure — I Did the Math
Your $2,000/Month Analytics Tool Costs $20 in Infrastructure — I Did the Math
A step-by-step breakdown of what actually happens to your data (and why SaaS companies pray you never learn this)
It costs a tool like Hyros/Triple Whale roughly $20 a month in infrastructure to handle your data.
Most brands pay around $1,000–$2,000.
The difference isn’t crazy — it’s what the market is willing to pay.
Every analytics system that you use is built on the same foundation: a collection of software hosted on servers (geek name for computers) that collect, move, process, and display data in the way that makes sense for businesses.
Some analytics companies run this software on their machines — think renting from Hetzner, DigitalOcean, etc.
Others use managed services that package the complexity and, in most instances, scale for them. Think Google Cloud, Azure, AWS, or Confluent.
Whether it’s running on AWS, Azure, Google Cloud, or a single self-hosted server, the underlying logic of the software behind it is always the same: extract, transform, model, visualise.
The only real variable is how much of that process you want to own — and how much you’re willing to outsource.
This might sound complex — a web of servers, software, and clouds — but it’s actually simple once you see how a modern data stack works.
Part 1: Education — How Modern Data Pipelines Work
Every company today — whether it’s Netflix, Spotify, or your D2C store/info product company — moves data through the same simple pattern.
Before we talk about those layers, let’s start with where that data actually comes from.
1️⃣ Where Data Is Created
Data is born in three main places (this applies mainly to online businesses):
a) Your own digital real estate — your website or app
When someone clicks, watches, scrolls, or buys, that behaviour is captured by tracking tools like Segment, Google Analytics, or Hyros, using small JavaScript snippet codes installed on your pages.
These are called behavioural data collectors — they observe what users do. Most of that observation happens client-side, meaning inside the user’s browser. Some can also be done server-side, but not everything.
This is the data source where you have the least control, since its generation happens on devices and environments you don’t fully own.
b) Your core systems of record — CRM, checkout, and payments. Platforms like Shopify, Stripe, bank accounts, or your internal CRM hold the ground truth: how much was paid, by whom, and when.
Unlike browser data, this information lives on your own servers — it’s precise, verified, and legally accountable.
For the sake of clarity, we can call this atomic data — the kind that doesn’t depend on pixels or models.
No matter what Hyros or Triple Whale tell you, every business owner eventually finds the real truth in one place: the bank account.
c) Third-party tools — external platforms you use every day
Advertising networks, email platforms, help-desk systems, affiliate trackers — all of them generate data about what’s happening off-site.
This is usually the hardest to unify because it lives on other people’s infrastructure, following their predefined data schemas that you have to fit into your own system.
Together, these three sources form the majority of your company’s raw data generation.
2️⃣ How That Data Moves
Once created, all that information has to travel through four layers so you can actually use it to make decisions:
Layer | Question It Answers | What It Works With / Examples |
|---|---|---|
1. Collection (Extraction) | “How do we pull data from all those places?” | Data sources: APIs, JavaScript snippets, webhooks. Tools: Meltano, Airbyte, Fivetran, Segment |
2. Storage | “Where do we keep it safely?” | Data warehouses like BigQuery, Snowflake, Postgres, or ClickHouse |
3. Modeling (Transformation) | “How do we clean, join, and structure it so it makes sense?” | dbt, SQL models, transformation scripts |
4. Visualization | “How do we show it to humans?” | Dashboards tools like Lightdash, Metabase, Looker |
(Quick note: ETL stands for Extract, Transform, Load — the classic pattern for moving and preparing data.
Modern stacks often flip it to ELT — Extract, Load, then Transform — since most cleaning now happens inside powerful cloud warehouses.)
Now let’s unpack each of these steps.
1. Collection (Extraction Layer)
The first step is getting data out of all the systems where it’s created.
There are two sides to this layer:
Component | What It Does | Examples |
|---|---|---|
Data Sources | Where the data lives — tools you already use. | Google Ads, Meta, Shopify, Stripe, Klaviyo, your website/app via scripts like Segment or Hyros |
ETL Tools | The software that connects to those sources, extracts data, and passes it downstream. | Meltano, Airbyte, Fivetran, custom scripts |
How it works in simple terms:
Tracking scripts (like Segment or Hyros) capture behavioural data from your site or app.
APIs pull system data (like sales from Stripe or Shopify).
ETL tools schedule and automate these pulls so your warehouse always stays up to date.
The output of this layer is a set of raw data tables — one for each source — ready to be stored and processed.
2. Storage (Where Data Lives)
Once data is collected, you need to put it somewhere — a home where everything can be stored, queried, and shared safely.
That’s the Storage layer.
Component | What It Does | Examples |
|---|---|---|
Data Warehouse / Database | Stores all the raw data coming from ETL tools in one centralized place. | BigQuery, Snowflake, Postgres, ClickHouse |
Data Lake (optional) | Used for large or unstructured data (logs, images, events). | S3, GCS, Azure Data Lake |
File Storage / Backups | Keeps historical copies or archives for compliance and recovery. | Cloud Storage buckets, S3 cold storage, CSV exports |
The storage layer acts as your central source of truth — where all marketing, sales, and product data lives together.
Instead of having ad spend in one app and orders in another, everything now sits inside a single warehouse, time-aligned and ready to be analyzed.
A few simple points:
Analytics = Data Warehouse. For analytical use cases, data warehouses are the best option because they allow insanely fast querying of massive datasets, thanks to their columnar design. We don’t need to go deep into that — just remember: for analytics, you need a warehouse.
Storage is cheap; compute is what costs. In modern managed warehouses (think Google Cloud, AWS, Azure, Snowflake), storing data is almost free — what costs money is compute.
What does compute mean? When you want to extract insight from data, you write a query (SQL) that selects or aggregates certain values.
Executing that query requires CPU power, memory, and temporary processing resources.
In other words: storing doesn’t cost much, but querying (doing the work) is what generates cost. This is the model followed by leaders like BigQuery and Amazon Redshift.
Of course, if you don’t want to use a managed service, you can self-host a warehouse like ClickHouse on your own server and only pay for that machine.
The output of this step is a single warehouse — all your raw data safely stored and ready for modeling.
3. Modeling (Transformation Layer)
If you’ve followed my previous articles, you know that in the modeling phase you actually start from the end of your thinking — not the beginning.
You don’t open your data warehouse and start writing SQL.
You start with the value chain of your business — how value is created, captured, and measured — and then you reverse-engineer it back into your data models.
You visualise your business as a system of flows:
ads → traffic → leads → customers → revenue → retention.
Then your job is to stitch those parts together in your warehouse so every node in that chain is backed by clean, connected data.
It’s dead simple: you translate your business logic into data logic, based on resources that you have available in warehouses.
You’re not just cleaning or organising data; you’re expressing how your business works in SQL.
Component | What It Does | Examples |
|---|---|---|
Transformation Tools | Run SQL or code to clean, combine, and structure data for analysis. | dbt, SQLMesh, Dataform, custom SQL scripts |
Business Logic Models | Define key entities and metrics (customer, order, conversion, LTV, CAC). | dbt models, materialized views, CTE chains |
Testing / Validation | Check that models are correct and consistent before use. | dbt tests, Great Expectations, assertions in SQL |
Modeling is also where you start to standardize definitions — something most teams skip.
If five people calculate “LTV” differently, you don’t have a reporting problem; you have a modeling problem.
A few points worth remembering:
Modeling creates the “truth layer.” It’s where raw data becomes useful context.
It’s an iterative process. Every business change eventually cascades down to this layer.
It’s creative work. The best models aren’t perfect SQL — they’re clear representations of how your company thinks.
The output of this step is a clean, analytics-ready data layer — everything named, joined, and defined in a way that matches how your business actually operates.
4. Visualization (How Data Becomes Readable)
Once data has been modeled, it needs a visual part — a way for people to read and understand it.
So we need the Visualization layer.
Component | What It Does | Examples |
|---|---|---|
BI / Dashboard Tools | Connect directly to your modeled data and display metrics through charts, tables, and dashboards. | Lightdash, Metabase, Looker, Superset |
Embedded Analytics / Reports | Turn insights into internal or client-facing reports. | Data Studio, Sheets, PDF exports |
Exploration Interfaces | Allow deeper analysis or ad-hoc queries. | Mode, Hex, Tableau, Observable |
Most of you have been around long enough to play with Google Data Studio, Power BI, and Tableau.
Essentially, this is where you try to tell the underlying story and emphasise signals that the data is sending to us.
Part 2: The Actual Cost to Serve One Client
Now that you understand how the data flows, let’s look at what it actually costs a tool like Hyros or Triple Whale to process it.
We’ll use a realistic example — a mid-size eCommerce business that:
Generates around 1 million events per month (pageviews, clicks, purchases, etc.)
Spends roughly $300,000/month on ads across Google and Meta
Uses Shopify and Stripe for payments
That’s a typical client for these analytics tools.
Let’s see what happens behind the scenes.
1️⃣ Collection — Getting the Data
When data is created, something needs to receive it — that’s the job of servers running tracking scripts and ETL processes.
In this phase, Hyros or Triple Whale handle:
Tracking scripts: lightweight JavaScript snippets running on your site that send events to their ingestion API.
API integrations: connectors pulling spend, campaign, and conversion data from Google Ads, Meta, Shopify, Stripe, etc.
ETL orchestration: small virtual machines or containers (think AWS Lambda, Cloud Run, or a shared EC2 instance) that schedule and run these syncs.
Each of those requires compute — CPU, memory, and bandwidth to process requests — but at this scale, the footprint is minimal.
Typical managed cost (1M events/month):
Serverless ingestion or Cloud Run jobs: $3–$5
API requests + bandwidth: $1–$2
Total: ≈ $5–$7/month
✅ Collection Cost: ~$5–$7/month
2️⃣ Storage — Where It Lives
Once collected, the data lands in a data warehouse — usually something like BigQuery, Snowflake, or ClickHouse.
This is where your ad spend, transactions, and events are stored and later processed.
At this scale (about 1M events ≈ 3–5 GB per month), storage costs almost nothing — but remember:
cloud warehouses also charge you when you query that data.
Storage pricing: BigQuery charges around $0.02 per GB per month for stored data.
Query pricing: BigQuery charges about $5 per terabyte scanned (roughly $0.005 per gigabyte).
So if your monthly queries scan around 1–2 TB of data — which is typical for daily transformations and dashboards — that’s roughly $5–$10/month in compute cost tied to this layer.
✅ Storage + Query Cost: ~$5–$10/month
3️⃣ Modeling — Making It Usable
Now that the data is stored, it has to be joined and processed into reports. This means running SQL queries or transformation jobs that calculate spend, revenue, and attribution.
In Hyros or Triple Whale’s case, this is where they run:
Attribution logic (which ad or campaign gets credit for a sale)
Aggregations (daily spend, ROAS, CPA, LTV by source)
Data validation (checking for missing IDs, duplicate events, etc.)
Every time these queries run, the warehouse spins up compute — measured in slots/mb (BigQuery) or credits (Snowflake).
But at this data size, it’s still minimal.
Typical managed cost:
Compute for daily transformations: $3–$5/month
Occasional ad-hoc recalculations: $1–$2/month
✅ Modeling Cost: ~$5–$7/month
4️⃣ Visualization — Making It Readable
After the data is modeled, it needs a front end to display it — charts, tables, dashboards.
That’s the Visualization layer.
In products like Hyros or Triple Whale, this is simply a web application (likely React or Next.js) that queries the warehouse through an API and renders visual components.
There’s no heavy traffic or global content distribution — each account has maybe a few daily users.
The real resources used here are:
A small shared web server or container running the dashboard app
Minimal database/API queries when someone opens a report
A few megabytes of bandwidth to serve charts and numbers
Even on managed hosting (Vercel, Cloud Run, or similar), this layer costs well under $1 per account per month — often measured in cents.
✅ Visualization Cost: <$1/month
Part 3: So Why Do They Charge $2,000?
Because they can — and honestly, good for them.
They’ve managed to bundle four different layers of technology — collection, storage, visualisation, and modeling — into one product that looks simple, even if what’s underneath isn’t.
For most brands, that’s not just software; it’s a replacement for a data team they’d never be able to hire.
A competent data engineer and analyst would easily cost a company $100,000+ per year, not counting infrastructure or ongoing maintenance.
Hyros or Triple Whale, at $24,000 per year, suddenly looks like a bargain.
And they’ve done it smartly — they’ve narrowed their focus to a few clear verticals: eCommerce & info-product space.
That means similar data sources, similar funnels, and similar tech stacks.
They don’t have to build for infinite use cases — just for a few recurring ones.
The pricing model — charging by data volume — and their ability to get away with it is just how strong demand is, and how ingrained the illusion about what these tools bring is in the market.
At this point, the obvious counter-argument appears:
“Sure, infrastructure is cheap — but that’s not their only cost. They have marketing, sales, support, and customer success.”
And that’s true.
These companies spend a fortune on marketing — sponsoring influencers (shillers), running paid ads, building entire education funnels around the illusion of “finally seeing everything clearly.”
They sell certainty, not infrastructure.
Then, downstream, they have customer success teams, onboarding programs, and “optimization calls” — all built to teach you how to think inside their framework. Essentially, if you are not getting the value that you expected, it’s because you don’t know how to use it properly.
That’s also where the main trade-off comes in.
When you adopt a bundled analytics tool, you’re not just outsourcing the technology — you’re outsourcing the thinking behind it.
By default, you’re accepting their template.
Instead of building your data system around how your business actually works, you adapt your business to fit how their data system expects it to work.
That’s why in the modern stack, there’s a reason each of the four layers — collection, storage, modeling, visualization — has its own ecosystem of specialized tools.
Every layer has its own complexity, and open-source or modular tools exist to let you handle that complexity in your own way.
Tools like Airbyte or Meltano let you define what to extract.
dbt lets you decide how to model.
Lightdash or Metabase let you decide how to visualize.
They exist to enable your thinking — not replace it.
With tools like Hyros or Triple Whale, that logic is reversed.
You’re boxed into one environment that controls all four layers.
You can’t tweak or extend it — you just work within the frame they give you.
Say you wanted to tag your ads in Google or Facebook based on emotional tone — “happy,” “curious,” “urgent” — and then analyze performance across those emotions.
In a modular setup, you’d just adjust your extraction schema to include those tags and connect them downstream in your models.
In Hyros or Triple Whale, you can’t.
You don’t control the extraction layer.
You can’t change what their API connector pulls, or how it’s modeled, or how it appears in the dashboard.
That’s the invisible limitation of these all-in-one tools:
they can only show you what they’ve already decided matters.
So yes — they charge more, not because they do more,
but because they’ve convinced you that their way is the right way.
And as long as that illusion holds, the market will happily pay for it.
Part 5: The Resolution
Using Hyros or Triple Whale isn’t wrong.
We’ve used them ourselves for clients.
They’ve solved real problems — offline uploads into ad networks, decent attribution models, integrations that save a ton of setup time.
And lately, some of them are even opening small doors for analysts — Triple Whale now lets you run SQL directly, which is a good step forward.
But the deeper issue with these tools isn’t technical.
It’s philosophical.
You don’t find insight by looking at dashboards.
You start with a hypothesis — “customers who watch our founder story convert better” — and then you build the measurement to test it.
Tools like Hyros or Triple Whale can’t do that for you.
They can only measure what they’ve been programmed to measure.
They show you everyone’s metrics, which means you end up with everyone’s insights.
Real advantage comes from measuring different things — things that reflect how your business creates value.
When you can afford it, build your own data pipeline.
Not because it’s cheaper — it rarely is — but because it lets you encode your thinking into your measurement system.
It’s how you turn data from something you rent into something you actually understand.
The $2,000 these tools charge isn’t about infrastructure.
The servers, storage, and compute that power them cost maybe $20 a month.
What you’re paying for is the abstraction — the idea that you can replace a data analyst, skip the thinking, and still get the answers.
That’s the real trade-off:
$20 worth of infrastructure, $1,980 worth of interpretation.
And that’s fine — as long as you understand what you’re buying.
You’re not buying better data.
You’re buying someone else’s way of looking at it.
If you have any questions please feel free to reach out.