The Signal

The Signal

10×ARR in 1/2 year: Why Structuring Unstructured Data Is Exploding

Once Data Becomes Structured, Everything Can Be Automated

John Tian's avatar
John Tian
Dec 29, 2025
∙ Paid

Many of the most compelling AI startups today are built by founders who focus squarely on enterprise customers. As a result, they often develop a much sharper intuition for where real, durable value lies in the B2B market.

Last year, Box founder Aaron Levie articulated a view with which I strongly agree: the single biggest opportunity in AI is not chatbots or consumer apps, but the ability to make sense of unstructured data finally.

The 90% Problem Inside Every Enterprise

Inside most companies, unstructured data accounts for roughly 90% of all information. Historically, this data was either impossible to process or painfully inefficient to work with. AI is now changing that equation entirely.

Levie has said that since founding Box, he has never seen a shift this dramatic in how enterprises handle information. For years, working with structured data was relatively straightforward. Anything stored in databases—ERP, CRM, HR systems—could be queried, calculated, aggregated, summarized, and analyzed with ease.

But that was always just a small slice of the picture.

In reality, structured data makes up only about 10% of enterprise information. The remaining 90% lives in documents, contracts, product specifications, financial records, marketing assets, videos, and other unstructured formats.

This information could be stored, shared, and searched—but it could not truly be understood. Its contents were largely opaque to machines.

Generative AI Changes the Rules

Generative AI changes that for the first time. With multimodal models, we can now interact with unstructured data directly. Computers can process text, images, and documents at an unlimited scale and speed, performing tasks that once required human judgment.

That shift fundamentally changes how enterprises work with information. Content is no longer a passive digital artifact that gets touched occasionally. It becomes a form of shared organizational memory—accessible to anyone, at any time.

Crucially, more information no longer makes things harder. We are moving into a world where having more data actually increases leverage. Digital information becomes one of a company’s most valuable assets.

Once Data Becomes Structured, Everything Can Be Automated

Once you can understand what’s inside a contract, an invoice, or a document—and extract structured data from it—you can automate nearly any workflow.

Sam Lessin, a partner at Slow Ventures, has made a similar argument, suggesting that once tools like NotebookLM exist, traditional structured CRM systems begin to lose their relevance altogether.

Image

AI meeting and note-takers fit squarely into this same trend. At their core, they are about turning unstructured speech into usable data.

As Otter’s founder has pointed out, meetings are one of the largest productivity black holes for knowledge workers, and voice remains one of the most underutilized data sources inside companies.

The 1st AI Note-Taker Surpass $100M ARR, 2 AI Fitness Apps hit $160M and $10M ARR

The 1st AI Note-Taker Surpass $100M ARR, 2 AI Fitness Apps hit $160M and $10M ARR

John Tian
·
December 26, 2025
Read full story

Otter’s evolution closely mirrors Levie’s thesis: expand across more data types, then layer agents on top to drive deeper enterprise automation.

Glean’s rapid growth reflects the same dynamic, with an even stronger emphasis on security and deep integration into internal company environments. Executives today are actively looking for a safe, reliable, enterprise-grade version of ChatGPT—one that actually understands how their company works.

What Glean effectively does is bring the power ChatGPT delivers to consumers into the enterprise, grounded in a company’s own context.

One of the biggest obstacles enterprises face with AI is that most models were never designed for their specific business. They are trained on public internet data, and when dropped into a corporate environment, they lack the context needed to be genuinely useful.

The Quiet Rise of Unstructured Data Infrastructure

Previously, a16z led Hebbia’s $130M series B, which focused on structuring unstructured data in verticals like finance.

More recently, however, a horizontal infrastructure company focused purely on turning unstructured data into structured data has emerged as a particularly clear signal of where the market is heading.

In 6 months, it went from zero to over $1 million in ARR. Over the following half year, revenue grew another 10×, surpassing $10 million.

User's avatar

Continue reading this post for free, courtesy of John Tian.

Or purchase a paid subscription.
© 2026 The Signal · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture