alinsoar
alinsoar

Reputation: 15783

Unexpected behaviour of the random.sample function during Monte Carlo simulation

Suppose that I want to simulate in Python a classical problem: there is a bag with 55 % red balls, and 45 % green balls.

I want to extract 10 balls, and detect what is the probabilyty of all these balls to be green.

I use a Monte Carlo simulation, using the function random.sample(balls, 10) like this:

from random import choice, shuffle, sample, randint

Red = False
Green = True

bags = [Red for _ in range(55)]
bags.extend([Green for _ in range(45)])
# shuffle(bags)

def testonce():
    return all(sample(bags, 10))

def test(N):
    K = 0
    for _ in xrange(N): K += testonce()
    return float(K)/N

print '/', test (10000000)
print ':', .45**10

This code prints the probability detected by simulation , and the real probability (the correct answer). It prints so:

/ 0.0001848
: 0.00034050628916

This difference shows me that

What do I miss ? How to write correctly the simulation , such that when N grows, the returned number to converge to the real probability ?

Upvotes: 0

Views: 229

Answers (1)

tacaswell
tacaswell

Reputation: 87366

radom.sample chose without replacement, you are computing the probability with replacement.

random.sample(population, k)
    Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement.

(doc)

The correct probabilty (for you MC) is:

In [30]: np.prod(np.arange(36,46)/np.arange(91.0,101))
Out[30]: 0.00018429406441449519

Upvotes: 2

Related Questions