Multi-Armed Bandit Simulation

Chris Tralie

The purpose of the simulation below is to have students explore the multi-armed bandit problem. You have 40 chances to pull on 4 different slot machines. Each slot machine behaves randomly, but has a fixed probability distribution; in other words, certain machines will payout more on average over time than others. But you don't know which ones have higher averages than others. Your goal is to maximize the total amount of payout that you have at the end. You should do this exactly once in one browser window, and be mindful of the strategy that you're using to maximize your reward by the end.