How to make money with Data?
Exhaustive list of pricing strategies and mental models for data assets
In the digital gold rush of the 21st century, data is the new gold.
But here's the twist: unlike bitcoin and gold, data's value is as elusive as it is enormous.
Giants like Google, Bloomberg, and now AI companies like OpenAI have turned their data into cash cows.
Let's dive into the fascinating world of data pricing strategies, with a special focus on how Software-as-a-Service (SaaS) models and APIs are changing the game. 🏊♂️
But before that …… let’s start with changing the way you currently think of data.
Use these data mental models to get rid of outdated thinking
The Fallacy of Intrinsic Value
We're often told that data, by its mere existence, holds value.
This is a dangerous oversimplification.
In the SaaS realm, raw data is just that—raw. Without the right use case, even the most comprehensive user behavior data or extensive financial metrics are just expensive digital paperweights.
The Use Case Trap
SaaS companies often fall into the trap of data hoarding, assuming more data equals more value. But here's my view: unless you have a laser-focused use case, most of that data is worthless.
Your churn prediction model doesn't care about your customers' favorite colors, no matter how much data you have on it.
The Additive Paradox
Yes, data is additive.
More user profiles, more engagement metrics, more everything.
But here's what we often miss: there's a point of diminishing returns.
After a certain threshold, adding more data can lead to analysis paralysis, slower systems, and less actionable insights.
In SaaS, speed and simplicity often trump exhaustive data.
The First-mover reality
In the cutthroat world of SaaS, if your competitor leverages a dataset to improve their product features or pricing strategy, your ability to benefit from that same/similar data plummets.
First-mover advantage is real, and in data terms, it's more pronounced than we'd like to admit.
The Lifecycle Letdown
We love to focus on the "early stage" of data assets, where early adopters reap huge rewards. But let's face it, most SaaS datasets never reach this stage. They languish in the "product market fit" stage incomplete and slow, or quickly move to the "decay stage" where they're commoditized.
The Marginal Value Mirage
The true measure of data value is its marginal impact on actions. But in our data-obsessed culture, we often mistake correlation for causation. Did your churn rate drop because of that new dataset, or was it your improved onboarding process?
The Scalability Mirage
We often hear that data value scales with company size.
But this overlooks a critical factor: capability.
A small, agile SaaS startup with a brilliant data science team can extract more value from a dataset than a lumbering enterprise with outdated analytics.
Size isn't everything; it's the sophistication of your data operations that truly matters.
The Uniqueness Deflated
Your data is unique but why? don’t chase fool's gold in the data economy.
Here's are some real world examples:
🏠 The Airbnb Fallacy: Remember when Airbnb thought their host-guest reviews were a goldmine? Turns out, TripAdvisor's hotel reviews and even Yelp's restaurant feedback provided similar traveler insights. Different data, same value.
🚗 The Tesla Trap: Tesla boasts about their fleet's real-world driving data. But insurance companies use simple OBD-II dongles to get comparable data on driver behavior. Tesla's "unique" data? Not so unique after all.
📸 The Instagram Illusion: Instagram thought user-generated photos were their moat. Then TikTok showed that short videos could reveal the same consumer trends and influencer dynamics. Photos, videos—both just signal user interests.
🏥 The 23andMe Mirage: 23andMe believed their genetic data was unparalleled. But medical records and even fitness tracker data can predict health risks too. Genetics is valuable BUT it is just one path to the same health insights.
🎮 The Roblox Reality Check: Roblox's user-generated game data seemed irreplaceable. Until Twitch streaming data and Discord server analytics offered similar gamer behavior insights. Different platforms, same gamer patterns.
Let’s change the way we think about data and now look at some vehicles for data monetization.
The Most Important Rule of Data Value
It's All About the Action 🎯
Data by itself is about as valuable as a book in a language you don't understand.
The real gold lies in what you can do with the data.
Whether it's Google using search data to target ads 🎯,
Quant funds using satellite imagery for trading 📈,
OpenAI using billions of web pages to teach ChatGPT how to chat
OR
Qualtrics running surveys to show companies what their customers are thinking and feeling
it's simple - the action enabled by data creates value.
APIs: The Secret Sauce in the Data Economy
But what if you don't want to build a whole SaaS platform?
Enter APIs, the unsung heroes of the data economy.
API pricing strategies
APIs can be priced for three use cases :
1) Integration
2) if you have proprietary data that is changing fast
3) Making it digestible for action by humans or AI
Here are the ways you can think of pricing API calls for data by
Per call: Each API request costs ¢. High volume? Discounts apply! 📞
By data volume: The more data returned, the higher the price.
Tiered access: The basic tier gets you names, premium gets you full profiles.
The Data Quality Conundrum:
What's "Good" Data Anyway?
In the software world, quality is often about features, uptime, and bug fixes. But in data land? It's a whole different ball game.
For quant funds, it's all about precision. One bad data point can mean millions lost. 📉💸
For adtech, coverage is king. More audience profiles = more ads sold.
For AI, it's about structure and cleanliness. Clean, well-annotated data makes models smarter. 🏷️🤖
For market research, it’s quality Survey responses, it’s about real people thinking hard before filling out surveys. No bots are allowed.
The SaaS (Data wrapper) pricing
Wrapping Data in User-Friendly APIs or Dashboards - Here's where SaaS enters. Instead of selling raw data, smart companies are packaging it into software.
It's like selling pre-baked cookies 🍪 instead of flour and sugar.
Google doesn't just hand over user intent data; they wrap it in the AdWords platform.
Bloomberg's terminals aren't just data feeds; they are full-fledged financial workstations.
Why does this work? Because SaaS makes data actionable.
You're not just buying information; you're buying the tools to turn that information into money.
What is the hard part?
pricing…
I'll transform this into a unique way of describing pricing using a "Data Buffet" metaphor. This analogy compares data pricing models to different aspects of a buffet-style restaurant:
Plate Size Pricing: Just like a buffet where you pay more for a larger plate, allowing you to pile on more food, data vendors charge based on your "plate size." This could mean:
More profiles (bigger plate, more customer data)
Longer histories (deeper plate, more historical data)
More fields or records (wider plate, more diverse data)
Gourmet Tier Pricing: At a buffet, premium stations offer higher-quality items like sushi or prime rib. In the data world:
High-accuracy data is like perfectly cooked steak. *** A quality dimension to data always has a disproportionate pricing increment.
Fully structured data is akin to beautifully plated dishes
Annotated data is like having a chef's notes on each dish
Service Level Pricing: Buffets vary in service. Some offer faster seating, fresher food rotations, or exclusive tables. Similarly, data vendors offer:
Speed (VIP seating for data access)
Recency (just-cooked data updates)
Exclusivity (private dining room data rights)
OK, enough of that metaphor. Here are the other ways to help you think about pricing
Per feature: Want that fancy graph? Pay up! 💳
Per action : Dynamic pricing tools that consider context and market factors. A data marketplace could develop a tool similar to Airbnb's dynamic pricing tool for hosts. This tool finds factors like market trends, competitor pricing, and customer behavior to suggest optimal prices.
Problem-based pricing: Customers set the price they're willing to pay for specific data or outcomes. Kaggle competitions where prizes are set for the best machine learning models.
Thing to watch out for as a data challenger
Brand sensitivity pricing.
An example is if you are selling financial data and you are going against Bloomberg, then you have to factor in brand affinity and discount your data pricing accordingly.
The AI Data Appetite
Quantity is a type of Data Quality
Here's a curveball: for AI, more data often trumps better data.
It's what researchers call "the unreasonable effectiveness of data."
OpenAI didn't make ChatGPT smarter by hand-picking the best web pages; they just fed it more.
A lot more.
This changes data pricing:
Bulk discounts become crucial. The more petabytes you buy, the cheaper each gig. So now there is a rush for companies like OpenAI, Claude and Anthropic to scale the scraping and annotation of ALL public data. Their competitive advantage is based on this
Success = Better LLMs + efficient processing + more access to data + quality loop with humans
Let’s break that down a little more.
Better LLMs are not based on just algorithmic superiority but also on how diverse pathways get added through a human collaborative approach ( a.k.a Mistral’s approach).
Efficient processing includes efficient crawling, data pipelines, efficient chips, networks, and the ability to work with small data.
More access to data means breaking down regulatory barriers, patent and copyrighting models
Quality loop with humans means working with existing data and artificially created data across multiple contexts, languages, and dialects. Even a sprinkle of quality (like deduplication or human-in-the-loop annotation) amplifies the LLM’s superiority.
The Recurring Revenue Dream
Data as a Service (DaaS) 💸🔄
SaaS lives and dies by Monthly Recurring Revenue (MRR).
But with data, especially for AI, most of the value is in the historical corpus.
Reddit's decade of posts matters more than this week's hot memes.
Solution? Build data flywheels:
User-generated content (Reddit, Twitter) that keeps growing.
User-tuned models (Hugging face AI models) that keeps getting better
Business data (Nasdaq's exchange data) that updates naturally.
Tech-driven (Google's search-ad loop) that gets smarter with use.
With these, you can offer data subscriptions just like SaaS. Monthly data dumps, API quotas that refresh, better model updates or continuous real-time feeds.
The New Frontier: Synthetic Data and APIs
Imagine an API that doesn't just serve data, but generates it on demand. That's synthetic data.
Use today's AI to generate training data for tomorrow's AI. It's a data perpetual motion machine:
Unlimited volume (no more data scarcity) ♾️
High quality (structured, annotated, and getting to unbiased is the ‘goal’)
Always fresh (no staleness issues)
Pricing?
Think SaaS + API + AI:
Tiered plans by volume and quality
Per-generation pricing (like per-API-call)
Upsells for specific data characteristics
Making Your Market:
Here's a secret:
the easier it is to measure a dataset's ROI, the bigger its market.
This is why finance and adtech are data goldmines - you can see the dollars flow in and out.
For SaaS and API data products:
Offer analytics dashboards. Show customers their ROI. 📉
Integrate with their existing tools. Make value visible.
Case studies and benchmarks. Make your data's impact legible.
The Final Frontier: Data Governance and Compliance 🏛️
As data becomes more valuable, governance becomes critical. It's like installing a safe when you strike gold.
SaaS and API implications:
Compliance as a feature. GDPR, CCPA-ready tiers.
Auditable APIs. Log every call, every data point.
Data provenance tracking. Know your data's "family tree."
Wrapping Up: We've come full circle with Data
Just as SaaS revolutionized software by making it a service, data is being transformed into a service through smart pricing strategies, APIs, and AI-driven innovations.
SaaS wraps data in usable, priceable packages. 🎁
APIs make data consumable and scalable.
AI turns data scarcity into data abundance.
In this new world, data isn't just the new oil or the new gold. Data is the new software. And just like software ate the world, data — priced right, delivered right — is set to do the same.
So, whether you're sitting on a data goldmine or building the next data-driven unicorn, remember: in the data economy, the right pricing strategy isn't just smart business. It's your rocket fuel. 🚀