15 March 2026

Why most experimentation programs stall

I've spent the better part of seven years building and fixing experimentation programs. Most of the ones I've encountered didn't fail because of bad ideas or lack of resources. They failed because of structural problems that compound quietly over time.

Here's what I mean.

The testing trap

Teams run tests without a clear hypothesis framework. They're optimising buttons when they should be questioning entire user journeys. The result is lots of activity, minimal learning, and zero compounding value.

I worked with a fintech company last year that was running 40+ tests per quarter. Their win rate was 12%. When we dug in, we found that most tests were reactive—someone had an idea, they tested it, it didn't work, they moved on. There was no connective tissue between experiments.

The shift that helped: establishing what I call a hypothesis hierarchy. Starting with strategic questions about the business model, then working down to tactical optimisations. It's slower to set up, but the learning compounds in ways that random testing never does.

The velocity obsession

"We need to run more tests" is rarely the answer. I've seen teams proudly announce 50+ tests per quarter while their win rate hovers at 15% and average impact stays flat. Velocity becomes a vanity metric.

What actually matters is learning velocity. One well-designed experiment that changes how you think about your product is worth more than ten inconclusive A/B tests. This is a hard sell in organisations that reward activity over insight, but it's true.

The silo problem

Experimentation lives in a growth team, separate from product and engineering. Results get shared in decks, not in decision-making moments. Learnings die in Confluence pages that no one reads.

The fix isn't organisational restructuring—it's changing how experiments integrate into existing workflows. Making "what experiment would validate this?" a standard question in planning. Embedding experimentation thinking into every product decision, not just the ones that happen to land on the growth team's roadmap.

What I've seen work

The programs that scale and compound share some common characteristics. Clear ownership with executive sponsorship. Integrated workflows where experiments actually inform roadmaps. Learning repositories that capture insights, not just results. And capability building, so experimentation spreads beyond one team.

If your program is stalled, don't try to fix everything at once. Pick one of these problems, address it systematically, and measure whether your learning rate improves.

The goal isn't more tests. It's better decisions, faster.