Reputation: 15783
Suppose that I want to simulate in Python a classical problem: there is a bag with 55 % red balls, and 45 % green balls.
I want to extract 10 balls, and detect what is the probabilyty of all these balls to be green.
I use a Monte Carlo simulation, using the function random.sample(balls, 10) like this:
from random import choice, shuffle, sample, randint
Red = False
Green = True
bags = [Red for _ in range(55)]
bags.extend([Green for _ in range(45)])
# shuffle(bags)
def testonce():
return all(sample(bags, 10))
def test(N):
K = 0
for _ in xrange(N): K += testonce()
return float(K)/N
print '/', test (10000000)
print ':', .45**10
This code prints the probability detected by simulation , and the real probability (the correct answer). It prints so:
/ 0.0001848
: 0.00034050628916
This difference shows me that
either the random module is wrong
either I miss something, and do something wrong in the code.
What do I miss ? How to write correctly the simulation , such that when N grows, the returned number to converge to the real probability ?
Upvotes: 0
Views: 229
Reputation: 87366
radom.sample
chose without replacement, you are computing the probability with replacement.
random.sample(population, k)
Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement.
(doc)
The correct probabilty (for you MC) is:
In [30]: np.prod(np.arange(36,46)/np.arange(91.0,101))
Out[30]: 0.00018429406441449519
Upvotes: 2