How Testing Works

You don't need any of this to run a test — A/B Testing covers the practical workflow, and the results screen tells you what to do. This page is for when you want to know what's happening under the hood: how Accelerate decides a winner, and how it splits traffic while it's deciding.

Bayesian, not p-values#

Traditional ("frequentist") testing makes you pick a sample size up front and wait until the test ends to read a result, then hands you a p-value to interpret. Accelerate uses a Bayesian model instead. At any moment it can answer the question you actually care about: what's the probability this variant is the best one? That's a direct statement you can act on, not a threshold you have to decode — and it's why you can watch a test from the moment it goes live without "peeking" invalidating it.

You'll see this as each variant's conversion rate plus its probability to win. (For the picture of how that certainty builds, see the posterior visual on A/B Testing.)

The one number that decides#

The one number that decides

One number: the chance the new version actually beats the original. The bar fills up from 50% (a coin toss) as evidence builds, and the dashed line is the 95% mark you need to clear. The moment it crosses and changes color, the test is settled. Ship the winner.

A winner is declared when its probability to win sustains 95% — and only once every variant has gathered enough conversions to be measured. Both conditions matter: 95% on a handful of conversions is noise, not a result. Accelerate holds the call until the evidence is real.

How the bandit allocates traffic#

Same test, as bars

Two bars, one per version, each sized to the share of visitors it's getting right now. They start level, then Variant A stretches toward 84% while B collapses. Same thing as the dots, read straight off the bar lengths.

New tests don't hold a fixed 50/50 split. They run as multi-armed bandits, moving traffic toward whatever is winning, in four phases:

Burn-in. Every test opens with an even split, so early noise can't hijack the allocation before there's any real signal.
Bandit. As confidence builds, more traffic flows to the leading variant (Thompson sampling). A floor keeps every variant getting some traffic, so a slow starter can still come back; a cap stops a runaway leader from starving the test of the data it needs.
Confirmatory. Once a variant reaches the 95% threshold, the split returns to even for a short confirmatory phase, so the number you act on is unbiased.
Winner. The winning content is served to all traffic and the test completes.

The payoff: less of your audience spends the test seeing the losing option, and you still get a clean final reading.

When you can trust the result#

Before you act on a test, three things should be true:

The leader's probability to win has held at 95% or above, not just touched it.
Every variant has enough conversions to be measured, not just the leader.
The test has run long enough to cover a representative slice of traffic — resist calling it on day one.

Treat any reported lift above roughly 10% as suspect until a confirmatory re-test backs it up: big wins are more often measurement error than miracles. And a test that picks no winner still taught you something about your audience — a clear "no difference" is a real result, not a failure.

Prefer a classic split?#

Bandit allocation is the default for new tests, but it's optional. Tests already running keep their existing behavior, and a filter switches new tests back to a classic even split if that's what you want.