derchambers
derchambers

Reputation: 954

Unexpected behavior in scipy isf

I am using scipy's stats module to try and determine values of a distribution at which the upper tail probability reaches some small value, but I am getting some very unrealistic results. For example:

I fit a beta distribution to an array of the square of normalized correlation coefficients for a signal matching operation (correlation coefficient is always between -1 and 1 so its square is between 0 and 1). Using

import scipy, numpy as np
bd=scipy.beta.fit(np.square(data),floc=0,fscale=1) #fitted beta dist

which gives me the beta distribution parameters of (0.42119596435034012, 16939.046996018118, 0, 1) the data array is about 3 million elements long.

Now when I plot the distribution it is clear that most the area of the distribution is very near 0 on the x axis

import matplotlib.pyplot as plt
x=x=np.linspace(0,1,num=1000000)
plt.plot(x,scipy.stats.beta.pdf(x,betaparams[0],betaparams[1]))
plt.xlim([0,.0001])

enter image description here

Now when I try to find the x value for which some upper tail probability remains I get some unexpected behavior. For example

for expon in [-1,-2,-3,-4,-5,-6,-7,-8,-9,-10]:
    print (expon,scipy.stats.beta.isf(10**expon,betaparams[0],betaparams[1]))

yeilds:

(-1, 6.9580465891063448e-05)
(-2, 0.00018124328968143608)
(-3, 0.00030250611696189104)
(-4, 0.00042796070123291116)
(-5, 0.0005557482540313166)
(-6, 0.00068501413697673774)
(-7, 0.99999966996999767)
(-8, 0.99999996699699967)
(-9, 0.99999999669970008)
(-10, 0.99999999966997)

Clearly scipy is returning poor estimates around 10**-7. My question is why, why it would express this behavior silently, and how to fix it.

Thanks

Upvotes: 1

Views: 259

Answers (1)

Robert Dodier
Robert Dodier

Reputation: 17585

This appears to be a bug in scipy.special.btdtri which is supposed to compute quantiles for the beta distribution. Maybe you can file a bug report.

>>> from scipy import special
>>> special.btdtri (betaparams[0],betaparams[1], 1-1e-6)
0.00068501413697504238
>>> special.btdtri (betaparams[0],betaparams[1], 1-1e-7)
0.99999966996999767

I can't figure out where btdtri is defined.

EDIT: For the record, here is the SciPy bug report: https://github.com/scipy/scipy/issues/4677

Upvotes: 2

Related Questions