Experimentation guide for product designers and managers

How to stop resisting experimentation and make the most of it.

Written together with Till Westermann.

0. Do we really need experimentation?

Experimentation, or A/B testing, is the most reliable way to see a cause-effect relationship between your product change and your metric change. Pre-post analysis limits the number of changes that can be done in a given time and is influenced by seasonality. Organizations that want to measure the impact of improvements introduced by product and development teams have to adopt A/B testing.

But how do you make the most of the practice? People often get stuck with a simplistic understanding of experimentation — experimentation as a filter. We test our ideas to filter out bad ones and keep good ones.

Experimentation as a filter: what we think are the good ideas and what are the actually good ideas
A simplistic understanding of experimentation

While this model is helpful, experimentation is way more than that — it’s a mechanism to update your beliefs. What we learn in one experiment informs further experimentation and the entire product strategy. Validating or invalidating a hypothesis early on allows a team to pivot on time.

Observation->Hypothesis->Experiment->Results->Observation cycle
Experimentation learning cycle: results of previous experiments inspire new experiments

Plus, with experimentation, we know the impact of what we’re doing. If a change is conclusively positive, but the impact is very small, does it justify code complexity and maintenance cost?

1. How to experiment well

Experimenting well requires a shift of mindset and is not easy. Experimentation proves teams wrong more times than is comfortable. Things that seemed like “obvious user experience improvements” don’t have the expected impact. And instead of moving on to the next item on the roadmap, a team has to iterate, which is frustrating and time-consuming. So many people feel resistance to experimentation.

To overcome people’s reluctance, management is tempted to introduce the number of experiments as a KPI. This often backfires as teams start running experiments for the sake of running experiments.

Running many experiments comes with a cost. Not just in terms of infrastructure load and increasing development complexity but also in product misalignment and inconsistency. If you only run 10 experiments, you have 2¹⁰ versions of the product.

That’s why it’s important to remember that experimentation is not the goal, it’s a way to validate your hypothesis. Successful experimentation comes from a habit of thinking in hypotheses.

In addition, experimentation requires a careful rethinking of how success is defined. In many companies, delivery equals success, and questions about the impact or learnings are rarely asked. When such a company introduces experimentation and most experiments fail, people worry that their failure is visible and they will be punished. So leadership needs to set the expectation that unsuccessful experimentation is fine as long as teams learn from it and the learning cycle keeps spinning faster.

2. How to write a good hypothesis

A hypothesis is a foundation for an experiment. It ties together the impact you expect to see from your change and why you think this will be the case.

If we do [a treatment], we’ll see [change in metric]

Example: Consider an e-commerce site. If we rename the “Buy now” button to “Add to cart,” we’ll see conversion rate increase.

This hypothesis is weak because it doesn’t refer to any prior knowledge or observations, so it’s not clear why we expect the stated outcome (increase in conversion).

Weak hypothesis lacks a foundation of observations or prior knowledge

Experiments with weak hypotheses are reasonable to run sometimes if the running cost* is low and it doesn’t affect the holistic user experience (which it usually does).

*Running cost: development cost (including design, copy, localization, QA, hidden costs with data analysis), runtime (do we want results in 2 months?).

The higher the costs of validation are, the stronger hypothesis we want. One of the downsides of experimentation as a validation method is the need to build what needs to be tested first. So if the costs to build it are high, we want to have more confidence that we’re both solving something that is actually a problem and solving it in the right way. This confidence might come from user research, data analysis, or usability heuristics.

Based on [research, data, past experiments, psychology principles etc.] we believe that

Applying [treatment] will cause [behavior change]

We consider this hypothesis valid if we see [change in metric].

Example for the same e-commerce store:

Based on user research, we know that users would like to learn more about the product before purchasing. We believe that adding a way to message suppliers will help users get their questions answered and buy the desired product. We consider this hypothesis valid if we see an increase in conversion rate.

A good hypothesis increases the chances of learning and decreases the risk of failure.

Summing things up

Experimentation is an essential tool for measuring the impact of product changes, but it’s difficult to adopt. It requires not only a technical setup but also a mindset shift.

The key is to think of product ideas as hypotheses to be validated rather than fixed deliverables, and developing good hypotheses takes practice.

A good hypothesis ties together what kind of behavioral and metric change to expect from implementing an idea, and what gives confidence in the idea.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Elena Borisova

Combining data and psychology for product and design decision-making | Head of Design at DeepL | elenaborisova.com