PaneliaTools

Sample size for an A/B test

Most A/B tests fail before they start: too little traffic per variant, and the observed difference stays inside statistical noise. Before launching, size the order of magnitude: this calculator, preset to a fine ±2% margin, gives you the number of users needed to measure each variant precisely — count that volume IN EACH branch of the test.

Keep the difference of goals in mind: a survey estimates one proportion; an A/B test detects a gap between two proportions. The size below guarantees the measurement precision of each variant. To rigorously size the detection of a small uplift (e.g. +1 conversion point), complement it with a statistical power calculation that includes the minimum detectable effect and the β risk.

Confidence level

95% is the market research standard. Z-scores: 1.645 · 1.96 · 2.576 (NIST statistical tables).

The acceptable gap between your sample and reality. ±5% is the most common choice.

If unsure, leave 50%: it's the worst case, requiring the largest sample.

The total number of people in your target. Above ~100,000 the impact is negligible: leave empty.

Respondents needed

2,401

You need 2,401 respondents for a 95% confidence level with a ±2% margin of error.

Export:

How many respondents per precision level?

Precision is expensive: going from ±5% to ±2% multiplies the sample by 6.

101001,00010,0001%3%5%8%10%Margin of error

Summary table

Sample size for the most common combinations.

Summary table
Confidence± 3%± 5%± 10%
90%75227168
95%1,06838597
99%1,844664166

Sample size: done. Now, the fieldwork…

Traditional fieldwork takes 6 weeks and $10,000. Panelia simulates 300+ calibrated respondents in 10 minutes.

Simulate my study

Frequently asked questions

Is the result for the whole test or per variant?
Per variant. An A/B test at ±2% and 95% confidence needs ~2,401 users in branch A AND as many in branch B.
Why ±2% rather than ±5% for an A/B test?
Because real conversion gaps are often small (1 to 3 points). With a ±5% margin, a 2-point uplift is undetectable: the two variants' intervals overlap.
When can I stop my test?
When each variant has reached the size calculated IN ADVANCE — not when the difference turns significant. Stopping a test as soon as it 'goes green' drastically inflates false positives.
My conversion rate is 3%, not 50%. What changes?
Enter 3% in 'expected proportion': p·(1−p) shrinks and the required sample drops. The 50% setting remains the cautious choice if you don't know your base rate.