In a previous post I discussed the potential benefits for using Monte Carlo simulations in place of traditional statistical modeling. Here, I would like to consider an interesting stats puzzle that I found to be non-trivial (read hard), which has a relatively tidy solution. First, we will investigate the solution using Monte Carlo methods. Second, we will generate an analytical solution. Hopefully the results from these two methods agree. If charity begins at home, probabilistic insights start with dice. So we will do exactly that.
The problem statement
Consider 3 dice sitting on the table in front of you. You pick up the dice and roll them all at once. The result?
4, 6, 5. Now take the minimum of these rolls. "4"
Write this number down and roll again. "1, 2, 1" with a minimum of "1". Rolling one more time produces "5, 5, 4" with a minimum of "4". Now lets look at the average of all these results.
mean([4, 1, 4] = 3
When conducting this process 3 times we found the average result to be 3. What if we did this an infinite number of times? What value do we expect average result to converge to? Hint: It's not 1
We can start this problem by simulating the process. A simulation will give us a value to "sanity check" our eventual analytical solution. Here is some Python code for running a Monte Carlo simulation for this problem statement.
From this code you can see that I have
minimum_of_dice to represent running the trial once,
run_trials to ramp up to N trials and record the results. I calculate the uncertainty in the mean using the following formula:
σ is the standard deviation of the data set.
z = 2, the z-score for a 95% confidence interval
N=100,000 is the number of trials in this test
Using this calculation, we can predict that expected minimum roll will fall somewhere between 2.0333 and 2.0477.
We are looking for the expected minimum from rolling 3 dice. So we have:
To get the expected minimum, lets first determine the CDF for Xmin.
Since we are working in a discrete context we can calculate the PDF for Xmin using the following equation:
Tabulating these probabilities:
Finally, using the a formula for expected value:
and this 49/24 figure is our answer. We can see this calculation falls within our confidence interval of (2.0333, 2.0477) so I am feeling pretty good about the correctness of our answer.
This methodology can be generalized to dice with more than 6 sides or trials with more than three dice. I will leave that as an exercise for the reader. If you discover a novel approach to this problem or have something like it that you think I might enjoy. Contact me on social media!