Statistical Significance Interpretation

ABConvert includes tools to help determine statistical significance in your test results. This involves hypothesis testing to identify if differences between test groups are statistically significant. For reliable insights, ensure your tests have sufficient data—typically at least 10,000 views and 200 orders.

Understanding Hypothesis Testing

One-Sided Hypothesis: Tests for an effect in one specific direction (e.g., whether a new price increases sales). Use this when you have a specific prediction about the direction of the effect.
Two-Sided Hypothesis: Allows for an effect in either direction (e.g., whether a new price changes sales, either up or down). Choose this when you are unsure about the direction of the effect or want to detect any change.

Understanding Confidence Level

Confidence level determines how strict you are when evaluating an A/B test. The higher the confidence level, the greater the difference needed to declare the test results statistically significant. It is often used alongside p-value, which is calculated from your data.

For example, if you have a p-value of 0.05:

Under a 99% confidence level, the result is not significant because (1 - 0.05) = 95%, which is less than the required 99% confidence level.
However, at a 90% confidence level, the result is significant because 95% is greater than 90%.

How to Choose the Right Confidence Level

The choice of confidence level depends on the cost of a Type 1 error (false positive) — when you mistakenly believe the variant is better when it’s actually not.

For example:

If you're changing the color of a "Buy Now" button and there's no development work required, you might be okay with a lower confidence level (e.g., 90%) since the cost of being wrong is low.
If you're making a complex checkout redesign that requires significant engineering effort, you'd want a higher confidence level (e.g., 99%) to reduce the risk of rolling out a change that doesn’t actually improve conversions.

In short:

Higher confidence level → Stricter test → Less chance of false positives
Lower confidence level → More relaxed test → More chance of false positives, but faster decisions

Interpreting Results

While ABConvert provides a "Conclusion" in the Statistical significance section to help you know if your test is statistically significant at a glance, it might be useful to understand each metric used to get that conclusion:

Lift: The percentage change in a certain metric between test groups, such as the conversion rates between original versus test group. Lift helps determine the effectiveness of a change or variant in your A/B test.
Confidence: Represents the probability that the results of an experiment are not due to random chance. It indicates how certain you can be about your test results. Common confidence levels are 90%, 95%, and 99%.
P-value: Indicator of statistical reliability. A low p-value (typically <0.05) suggests significant differences between groups.

How Our p-value is Calculated

Step 1: Calculate the Z-Score

The Z-score tells you how different the two conversion rates (your primary metric) are compared to what’s expected by chance. It’s calculated using:

Step 2: Convert the Z-Score to a P-Value

The p-value tells you the probability of getting the observed difference if there were actually no real effect.

If you’re running a one-tailed test, the p-value measures the probability of getting a result as extreme as your observed data. If you’re running a two-tailed test, the p-value is doubled to account for differences in either direction.

Example: A/B Test in Action

Let’s say you test a new button color on your checkout page:

Variant A (original button): 5% conversion rate, 1,000 users
Variant B (new button): 6% conversion rate, 1,000 users
Test type: Two-tailed (checking for any difference)

Using the formula, you calculate:

Z-score ≈ -1.12
P-value ≈ 0.263

Since p = 0.263 is greater than 0.05 (95% confidence level), the result is not statistically significant. This means the difference could be due to random chance rather than a real effect.

What This Means for Your Test

If p-value < 0.05: Your result is statistically significant, meaning the variant likely had a real impact.
If p-value > 0.05: Your result is not statistically significant, so the change might not be meaningful.

Note: Under 95% confidence level.

Conclusion

This article explains how to utilize the ABConvert statistical significance tool to help you determine the statistical significance of your A/B test results, ensuring that any observed differences between test groups are not due to random chance.

PreviousAnalytics Discrepancies NextChangelog

Last updated 3 months ago