Option A

Conversions

Total reached

Observed: 3.0%

Option B

Conversions

Total reached

Observed: 4.0%

PriorA starting belief before this experiment's data arrive. Enter the historical conversions and total for this channel or page — both arms start from the same baseline. Leave both at zero for a flat prior (every rate equally plausible), which is right for genuinely greenfield experiments.

Conversions

Total reached

Baseline: — (flat)

Leave both at zero for a flat prior. Both arms share this baseline; experiment data update from there.

Your barNot every lift is worth acting on. A win of 0.1 percentage points may be too tiny to justify changing things. This sets the smallest improvement that would actually matter to you: the meaningful uplift B must beat to count as a real win. 1.0 pts

B must beat A by at least this many percentage points to count as a win, not just a hair.

50%

Chance B clears your bar

89%

Chance B beats A at all

+1.0

Most likely gap, points

…

Believable rangeThe set of values the true conversion rate could plausibly take, given what you observed (statisticians call this a credible interval). A narrow curve means high confidence; a wide curve means low. Where the two curves overlap is the uncertainty that the two options might really have the same rate. of each option’s true rate AB

Posterior difference (B − A)

Method. Each true conversion rate is modelled as a Beta distributionA flexible mathematical shape for representing belief about a rate or proportion. It can be roughly symmetric or strongly lopsided, narrow or wide. The two curves on the chart are Beta distributions, one for each option’s true rate. The starting belief (before any data) is a flat Beta(1,1): every rate equally plausible. With samples of any reasonable size, the data dominate this starting point. (the two curves). Probabilities are computed deterministically by numerical integration, to high precision, no simulation noise. The starting assumption is that every rate is equally plausible (a uniform Beta(1,1) prior); with samples this size the data dominate it.

Method

Why not just
compare the rates?

Two options, one converts higher: two marketing creatives, or two variants in a product rollout. You call the higher one the winner, but a higher rate in a small sample can be luck, and a real win can still be too small to act on. This is a Bayesian comparisonA way of reasoning that treats the unknown rate as a range of plausible values rather than a single ‘true’ number. The result is a probability that one option genuinely beats the other, not a yes/no test verdict. that asks how likely B clears your bar for a win worth acting on, and computes it deterministically.

A rate is a range

Six heads in ten flips doesn’t prove a biased coin. A’s default 30 of 1,000 looks like 3%, but that observed rate is one sample; the true rate sits somewhere in a range around it. Small samples mean wide ranges.

Compare the ranges

The two curves are those ranges. B’s 40 of 1,000 sits higher, yet its curve overlaps A’s, and that overlap is the uncertainty. The question isn’t which rate is higher. It’s how much of B’s range sits above A’s.

iii

Decide on the gap

“B beats A” isn’t enough if the gap is trivial. Set your bar for the smallest win worth acting on, and read the probability that B clears it. With a 1-point bar, the live chart above shows exactly this case. That’s the number to act on.

Bayesian A/B Test Calculator

Why not justcompare the rates?

A rate is a range

Compare the ranges

Decide on the gap

Why not just
compare the rates?