A/B Testing: What to do when you do not have enough traffic on your site? (Part II)

Vinay Roy
4 min readJul 10, 2020

In the previous article, we discussed a few approaches to employ when the product or surface area of the product does not have enough traffic. However, there are a few others issues that product managers and analytics have to deal with, most important of them all is the opportunity cost of running an A/B test. Every day that you are running an A/B test, there is a risk of

  • Delaying others high value tests that might be in the pipeline
  • Showing an inferior version to 50% of your users and hence a poor user experience for 50% of your user base
  • Losing monetization opportunity from one of the variants
  • Explaining to stakeholders, other product manager, execs on how long we would be running the test

To solve this, many a times product and analytics get into the habit of peeking at data. This means looking at data every day, calculating the p-value to see whether it reached significance. This is how it looks like in practice:

Peeking at A/B Test

There is a dense literature on the perils of peeking at P-Value. The following paragraph from here summarizes this well

“Peeking”, or sometimes “unaccounted peeking” and more precisely “unaccounted peeking with intent to stop” is a term used to describe the poor practice of checking the results of an ongoing A/B test with the intent to stop it and make a decision or inference based on the observed outcome: value of the variable of interest, reaching a given significance threshold, etc. Peeking is a significant treat to the validity of any online controlled experiment as it can uncontrollably increase the type I error rate and render any significance calculations or confidence intervals meaningless.

It is a poor practice since it causes a discrepancy between the nominal p-value you calculate and the actual p-value which reflects the probability of observing the result you observed under the actual circumstances. With just a few looks at the data the actual significance can be orders of magnitude larger than the nominal.

So here are two ways in which we can solve this issue as well as low traffic issue:

Sequential sampling: The biggest difference with traditional A/B testing is that in Sequential sampling you can peek as much as you want. As soon as the test reaches the significance level, the variant (Control or treatment) with higher conversion rate is declared the winner. However, if significance is not detected and experiment reaches the stopping criteria, the test is declared complete with no winner. Below are the detailed steps of the sequential A/B testing procedure:

  1. Choose a sample size, say N before running the test
  2. Start the test with 50:50 split for a two way test
  3. Track the conversions for both variations. Let’s refer to the absolute conversions of treatment variation as T, and absolute conversions of control as C
  4. Finish the test in favor of treatment when T−C ≥ 2*√N
  5. Finish the test with no winner when T+C ≥ N

If you are interested in exploring the derivation, I would urge you to read this detailed article on the topic by Evan Miller. You can use the calculator here to calculate the conversion needed for control or treatment to be winner.

Continuous monitoring: The peeking problem has much higher chances of causing issue early on in the test than later — when you have more data. You would assume that since so much literature is written, no one commits this mistake but it is surprisingly quite common — more prevalent at startups than matured companies.

This is what a typical peeking would lead to. The below plot is for 1 - (p-value). As we see p-value crosses alpha twice during the lifetime of the test — Early on and later.

Source

One way to solve this is through continuous monitoring but varying level of alpha. In other words, confidence level is budgeted across the life time of the run. When we start the test, we can keep a low alpha value but as the test progresses we can relax this criteria. This is how it will look in practice:

Budgeted confidence level plot to avoid early stoppage

As we discussed there are many ways to solve low traffic issue. It wont be one solution fits all, so you will have to see what works for your specific cases. If you are using some other methods that I did not cover in Part I or Part II, then do drop in comments.

--

--