Reputation: 59
I have a question regarding hypergeometric distribution computation using Python. Say I have a bag of 15 balls where 5 are blue and 10 are red. If I randomly pick 5 balls out, what are the odds that I pick exactly 4 blues out of the 5? If I do it in Python using simulation here is the code:
import numpy as np
balls=['blue']*5+['red']*10
count=0
for i in range(10000):
pick=np.random.choice(balls, 5)
if list(pick).count('blue')==4:
count+=1
odds=count/10000
print(odds)
I get around 0.04. But if I use scipy.stats, I get a different number. The code is very simple.
from scipy import stats
odds=stats.hypergeom.pmf(4, 15, 5, 5)
print(odds)
I get 0.016. So why are these two different?
Upvotes: 1
Views: 301
Reputation: 114811
To match the hypergeometric distribution, you must pick the 5 balls without replacement. Change the generation of pick
to
pick = np.random.choice(balls, 5, replace=False)
Upvotes: 1