More efficient simluation of 2 dice rolls - Python

Question

I wrote a program that records how many times 2 fair dice need to be rolled to match the probabilities for each result that we should expect.

I think it works but I'm wondering if there's a more resource friendly way to solve this problem.

 import random

 expected = [0.0, 0.0, 0.028, 0.056, 0.083, 
             0.111, 0.139, 0.167, 0.139, 0.111,
             0.083, 0.056, 0.028]

 results = [0.0] * 13  # store our empirical results here

 emp_percent = [0.0] * 13  # results / by count

 count = 0.0  # how many times have we rolled the dice? 

 while True:
     r = random.randrange(1,7) + random.randrange(1,7)  # roll our die
     count += 1 
     results[r] += 1
     emp_percent = results[:]

     for i in range(len(emp_percent)):
         emp_percent[i] /= count
         emp_percent[i] = round(emp_percent[i], 3)

     if emp_percent == expected:
         break

print(count)
print(emp_percent)

Kevin · Accepted Answer

There are several problems here.

Firstly, there is no guarantee that this will ever terminate, nor is it particularly likely to terminate in a reasonable amount of time. Ignoring floating point arithmetic issues, this should only terminate when your numbers are distributed exactly right. But the law of large numbers does not guarantee this will ever happen. The law of large numbers works like this:

Your initial results are (by random chance) almost certainly biased one way or another.
Eventually, the trials not yet performed will greatly outnumber your initial trials, and the lack of bias in those later trials will outweigh your initial bias.

Notice that the initial bias is never counterbalanced. Rather, it is dwarfed by the rest of the results. This means the bias tends to zero, but it does not guarantee the bias actually vanishes in a finite number of trials. Indeed, it specifically predicts that progressively smaller amounts of bias will continue to exist indefinitely. So it would be entirely possible that this algorithm never terminates, because there's always that tiny bit of bias still hanging around, statistically insignificant, but still very much there.

That's bad enough, but you're also working with floating point, which has its own issues; in particular, floating point arithmetic violates lots of conventional rules of math because the computer keeps doing intermediate rounding to ensure the numbers continue to fit into memory, even if they are repeating (in base 2) or irrational. The fact that you are rounding the empirical percents to three decimal places doesn't actually fix this, because not all terminating decimals (base 10) are terminating binary values (base 2), so you may still find mismatches between your empirical and expected values. Instead of doing this:

if emp_percent == expected:
    break

...you might try this (in Python 3.5+ only):

if all(map(math.is_close, emp_percent, expected)):
    break

This solves both problems at once. By default, math.is_close() requires the values to be within (about) 9 decimal places of one another, so it inserts the necessary give for this algorithm to actually have a chance of working. Note that it does require special handling for comparisons involving zero, so you may need to tweak this code for your use case, like this:

is_close = functools.partial(math.is_close, abs_tol=1e-9)
if all(map(is_close, emp_percent, expected)):
    break

math.is_close() also removes the need to round your empiricals, since it can do this approximation for you:

is_close = functools.partial(math.is_close, rel_tol=1e-3, abs_tol=1e-5)
if all(map(is_close, emp_percent, expected)):
    break

If you really don't want these approximations, you will have to give up floating point and work with fractions exclusively. They produce exact results when divided by one another. However, you still have the problem that your algorithm is unlikely to terminate quickly (or perhaps at all), for the reasons discussed above.

More efficient simluation of 2 dice rolls - Python

Answers (2)

Related Questions