A/B testing and multi-armed bandit testing best practices

Most people in the marketing game have heard of A/B testing, but maybe less familiar with multi-armed bandit testing. The term ‘multi-armed bandit’ actually derives from people who play slot machines, called one-arm bandits who decide which machines to play, how many times, and in which order for the maximum output. Any sort of testing helps your company generate insights into your customer’s preferences and figure out designs that convert website visitors to leads and customers. Let’s explore A/B testing and multi-armed bandit testing best practices.

What is A/B testing?

A/B testing is quite simple in that it’s a comparison of two variations of the same page, email, or marketing asset. For example, you might create two versions of a landing page and test if more customers convert with choice A or choice B. You’ll set the parameters of your testing, i.e. how long will you test for and when will you determine the outcome? You can even do smaller A/B tests such as changing a blog post title and testing which version gets more clicks or changing the color of your CTA button. A/B testing can be as small or wide in scope as you make it. But, either way, it provides invaluable insights into what your customer wants.

What is multi-armed bandit testing?

Multi-armed bandit testing is a more complex and technical form of A/B testing that uses machine learning AI-first and algorithms to divert traffic to variations that perform well and allocate less traffic to underperforming pages. This type of testing is faster because it’s done computationally and the winning variation can be determined much faster.

Back to the slot machines, the one-armed bandits have to make quick decisions and take multiple actions that each have an unknown payout. The goal is to make the most profitable choice, i.e. hit the slot machine that wins big at the right time. The same is true for your website. You need to create variations that target your customers in a way that will give you the most profitable payout. 

Think about a news website. Unless you have created an account and they have collected your data, when you visit a news outlet, they take a risk in displaying content on their homepage for you from their ads to their articles. What do they show you? How do they determine what will get your attention? Their goal is to increase ad revenue and clicks. They want to generate results that are of interest to you, which is difficult when they don’t understand your preferences. The website has to make decisions in hopes that those decisions will pay off. 

What are the types of multi-armed bandit testing?


  • Balances exploration with exploitation
  • Greedy experiments always pull the lever with the highest known payout except when random action is taken
  • Arms are chosen at random a fraction of the time
  • Most of the time the arm with the highest payout is pulled

Upper confidence bound

  • Based on the psychological principle that people are most optimistic when faced with uncertainty
  • Think about when you scratch a scratch card: if you win, you’re likely to test your luck on another
  • The principle assumes that the payoffs of each arm will be as high as possible

Thompson sampling (Bayesian)

  • Randomized probability matching
  • The number of pulls of a lever matches the probability of it being the optimal lever

Contextual bandit

  • When any data or context is provided on your data such as previously visited pages, past purchases, device information, location, you can increase and optimize click-through rate

What are multi-armed banding testing best practices?

With A/B testing, you may send your traffic equally 50/50 to the two variations of the page. In multi-armed bandit testing, you’ll spend 10% of the time testing which pages perform better than then start diverting the traffic 90% of the time to the winning pages. The downside is to understand which page is performing worse statistically, you need a lot more data and a lot more traffic to create a statistical significance. With multi-armed bandit testing, the average conversion rate is higher but the extra investment may or may not be worth it with the initial cost.

Should you use A/B testing or multi-armed bandit testing?

In A/B testing, you explore options A and B over a specified period and determine the ‘winner’. But you have wasted resources on the B version when you’ve decided to go in the direction of A.

Multi-armed banding testing is more complex. The tests are in real-time and adaptive. Traffic is gradually moved towards a winning variation for you. The process is faster and more efficient and you don’t waste time trailing variations that don’t work. But, the downside is the initial cost and it’s resource-intensive to conduct multi-armed bandit tests.

As a general rule, you should prefer multi-armed bandit testing when the cost of pulling the ‘losing arm’ is detrimental. For example, if presenting the wrong ad to a target customer would mean loss of revenue for you. It also works well when there are more than two variations to be tested. 

Need help with your A/B tests or multi-armed bandit testing? 

We specialize in creating award-winning websites that meet your target audience’s needs. We understand our customer demographics because we conduct A/B tests, usability tests, and multi-armed bandit tests for ourselves and our clients. If you need help with these services, connect with Key Medium today, and unlock your potential.

Elaine, an SEO Specialist and Content Writer

Elaine Frieman holds a Master’s Degree and is a UK-based professional editor, educational writer, and former marketing agency content writer where she wrote articles for disparate clients using SEO best practice. She enjoys reading, writing, walking in the countryside, traveling, spending time with other people’s cats, and going for afternoon tea.