msunij
msunij

Reputation: 309

Workaround for 'sum pk not equal to 1' error in scipy's stats.rv_discrete module

In python3, scipy's stats.rv_discrete function requires that the sum of the probabilities to be 1 but because of the representation of floats in memory, the sum is not 1.

Fortunately scipy was installed in my home directory, so I was able to comment out the 'if' lines checking the sum in the ~.local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py file, thus making it work. But what to do when this code is to be run in another system. Copying the file to working directory and importing it results in too many erros. A custom code from scratch (using lists) seems highly inefficient.

if len(xk) != len(pk):
    raise ValueError("xk and pk need to have the same length.")
#if not np.allclose(np.sum(pk), 1):
    #raise ValueError("The sum of provided pk is not 1.")

An efficient function from scratch or a proper workaround is what I hope to get.

Upvotes: 3

Views: 941

Answers (1)

jose_bacoy
jose_bacoy

Reputation: 12684

You can normalize the values of pk so that you can avoid the error. This will "force" the sum of the probabilities to be equal to 1.

Before:

from scipy import stats
xk = np.arange(7)
pk = (0.1, 0.2, 0.3, 0.1, 0.1, 0.0, 0.19) 
custm = stats.rv_discrete(name='custm', values=(xk, pk))

Error: ValueError: The sum of provided pk is not 1.

After:

from scipy import stats
xk = np.arange(7)
pk = (0.1, 0.2, 0.3, 0.1, 0.1, 0.0, 0.19)
pk_norm = tuple(p/sum(pk) for p in pk)
custm = stats.rv_discrete(name='custm', values=(xk, pk_norm))

Result: Ok

Upvotes: 3

Related Questions