Reputation: 2173
I want to compute binomial probabilities on python. I tried to apply the formula:
probability = scipy.misc.comb(n,k)*(p**k)*((1-p)**(n-k))
Some of the probabilities I get are infinite. I checked some values for which p=inf. For one of them, n=450,000 and k=17. This value must be greater than 1e302 which is the maximum value handled by floats.
I then tried to use sum(np.random.binomial(n,p,numberOfTrials)==valueOfInterest)/numberOfTrials
This draws numberOfTrials samples and computes the average number of times the value valueOfInterest is drawn.
This doesn't raise any infinite value. However, is this a valid way to proceed? And why this way wouldn't raise any infinite value whereas computing the probabilities does?
Upvotes: 8
Views: 5451
Reputation: 11
To avoid multiplicity like zero by like infinity use step by step multiplication as this.
def Pbinom(N,p,k):
q=1-p
lt1=[q]*(N-k)
gt1=list(map(lambda x: p*(N-k+x)/x, range(1,k+1)))
Pb=1.0
while (len(lt1) + len(gt1)) > 0:
if Pb>1:
if len(lt1)>0:
Pb*=lt1.pop()
else:
if len(gt1)>0:
Pb*=gt1.pop()
else:
if len(gt1)>0:
Pb*=gt1.pop()
else:
if len(lt1)>0:
Pb*=lt1.pop()
return Pb
Upvotes: 1
Reputation: 25823
Because you're using scipy I thought I would mention that scipy already has statistical distributions implemented. Also note that when n is this large the binomial distribution is well approximated by the normal distribution (or Poisson if p is very small).
n = 450000
p = .5
k = np.array([17., 225000, 226000])
b = scipy.stats.binom(n, p)
print b.pmf(k)
# array([ 0.00000000e+00, 1.18941527e-03, 1.39679862e-05])
n = scipy.stats.norm(n*p, np.sqrt(n*p*(1-p)))
print n.pdf(k)
# array([ 0.00000000e+00, 1.18941608e-03, 1.39680605e-05])
print b.pmf(k) - n.pdf(k)
# array([ 0.00000000e+00, -8.10313274e-10, -7.43085142e-11])
Upvotes: 9
Reputation: 5068
Work in the log domain to compute combination and exponentiation functions and then raise them to exponent.
Something like this:
combination_num = range(k+1, n+1)
combination_den = range(1, n-k+1)
combination_log = np.log(combination_num).sum() - np.log(combination_den).sum()
p_k_log = k * np.log(p)
neg_p_K_log = (n - k) * np.log(1 - p)
p_log = combination_log + p_k_log + neg_p_K_log
probability = np.exp(p_log)
Gets rid of numeric underflow/overflow because of large numbers. On your example with n=450000
and p = 0.5, k = 17
, it returns p_log = -311728.4
, i. e., the log of final probability is pretty small and hence underflow occurs while taking np.exp
. However, you can still work with log probability.
Upvotes: 7
Reputation: 10667
I thing you should do all you computation using logarithms:
from scipy import special, exp, log
lgam = special.gammaln
def binomial(n, k, p):
return exp(lgam(n+1) - lgam(n-k+1) - lgam(k+1) + k*log(p) + (n-k)*log(1.-p))
Upvotes: 7