Reputation: 75
I wrote a program that records how many times 2 fair dice need to be rolled to match the probabilities for each result that we should expect.
I think it works but I'm wondering if there's a more resource friendly way to solve this problem.
import random
expected = [0.0, 0.0, 0.028, 0.056, 0.083,
0.111, 0.139, 0.167, 0.139, 0.111,
0.083, 0.056, 0.028]
results = [0.0] * 13 # store our empirical results here
emp_percent = [0.0] * 13 # results / by count
count = 0.0 # how many times have we rolled the dice?
while True:
r = random.randrange(1,7) + random.randrange(1,7) # roll our die
count += 1
results[r] += 1
emp_percent = results[:]
for i in range(len(emp_percent)):
emp_percent[i] /= count
emp_percent[i] = round(emp_percent[i], 3)
if emp_percent == expected:
break
print(count)
print(emp_percent)
Upvotes: 3
Views: 1076
Reputation: 52008
Rather than trying to match floating point numbers -- you could try to match expected values for each possible sum. This is equivalent to what you are trying to do since (observed number)/(number of trials) == (theoretical probability) if and only if the observed number equals the expected number. These will always be an integer exactly when the number of rolls is a multiple of 36. Hence, if the number of rolls is not a multiple of 36 then it is impossible for your observations to equal expectations exactly.
To get the expected values, note that the numerators that appear in the exact probabilities of the various sums (1,2,3,4,5,6,5,4,3,2,1 for the sums 2,3,..., 12 respectively) are the expected values for the sums if the dice are rolled 36 times. If the dice are rolled 36i times then multiply these numerators by i to get the expected values of the sums. The following code simulates repeatedly rolling a pair of fair dice 36 times, accumulating the total counts and then comparing them with the expected counts. If there is a perfect match, the number of trials (where a trial is 36 rolls) needed to get the match is returned. If this doesn't happen by max_trials
, a vector showing the discrepancy between the final counts and final expected value is given:
import random
def roll36(counts):
for i in range(36):
r1 = random.randint(1,6)
r2 = random.randint(1,6)
counts[r1+r2 - 2] += 1
def match_expected(max_trials):
counts = [0]*11
numerators = [1,2,3,4,5,6,5,4,3,2,1]
for i in range(1, max_trials+1):
roll36(counts)
expected = [i*j for j in numerators]
if counts == expected:
return i
#else:
return [c-e for c,e in zip(counts,expected)]
Here is some typical output:
>>> match_expected(1000000)
[-750, 84, 705, -286, 5783, -3504, -1208, 1460, 543, -1646, -1181]
Not only have the exact expected values never been observed in 36 million simulated rolls of a pair of fair dice, in the final state the discrepancies between observations and expectations have become quite large (in absolute value -- the relative discrepancies are approaching zero, as the law of large numbers predicts). This approach is unlikely to ever yield a perfect match. A variation that would work (while still focusing on expected numbers) would be to iterate until the observations pass a chi-squared goodness of fit test when compared with the theoretical distribution. In that case there would no longer be any reason to focus on multiples of 36.
Upvotes: 0
Reputation: 30161
There are several problems here.
Firstly, there is no guarantee that this will ever terminate, nor is it particularly likely to terminate in a reasonable amount of time. Ignoring floating point arithmetic issues, this should only terminate when your numbers are distributed exactly right. But the law of large numbers does not guarantee this will ever happen. The law of large numbers works like this:
Notice that the initial bias is never counterbalanced. Rather, it is dwarfed by the rest of the results. This means the bias tends to zero, but it does not guarantee the bias actually vanishes in a finite number of trials. Indeed, it specifically predicts that progressively smaller amounts of bias will continue to exist indefinitely. So it would be entirely possible that this algorithm never terminates, because there's always that tiny bit of bias still hanging around, statistically insignificant, but still very much there.
That's bad enough, but you're also working with floating point, which has its own issues; in particular, floating point arithmetic violates lots of conventional rules of math because the computer keeps doing intermediate rounding to ensure the numbers continue to fit into memory, even if they are repeating (in base 2) or irrational. The fact that you are rounding the empirical percents to three decimal places doesn't actually fix this, because not all terminating decimals (base 10) are terminating binary values (base 2), so you may still find mismatches between your empirical and expected values. Instead of doing this:
if emp_percent == expected:
break
...you might try this (in Python 3.5+ only):
if all(map(math.is_close, emp_percent, expected)):
break
This solves both problems at once. By default, math.is_close()
requires the values to be within (about) 9 decimal places of one another, so it inserts the necessary give for this algorithm to actually have a chance of working. Note that it does require special handling for comparisons involving zero, so you may need to tweak this code for your use case, like this:
is_close = functools.partial(math.is_close, abs_tol=1e-9)
if all(map(is_close, emp_percent, expected)):
break
math.is_close()
also removes the need to round your empiricals, since it can do this approximation for you:
is_close = functools.partial(math.is_close, rel_tol=1e-3, abs_tol=1e-5)
if all(map(is_close, emp_percent, expected)):
break
If you really don't want these approximations, you will have to give up floating point and work with fractions
exclusively. They produce exact results when divided by one another. However, you still have the problem that your algorithm is unlikely to terminate quickly (or perhaps at all), for the reasons discussed above.
Upvotes: 1