Kushal Mohnot
Kushal Mohnot

Reputation: 105

scipy.stats.ttest_ind returning a p-value of 0.0

I am trying to conduct a hypothesis test, but I am getting a p-value of 0.0. I can understand if it is really small, but a 0.0 is leading me to believe that there is an error. Here is my code:

results = st.ttest_ind(
    ultimate_users['monthly_profit'], 
    surf_users['monthly_profit'], 
    nan_policy = 'omit')

print('p-value: ', results.pvalue)

output:

p-value:  0.0

Upvotes: 0

Views: 963

Answers (1)

Matt Haberland
Matt Haberland

Reputation: 3738

There isn't necessarily an error. If the p-value is smaller than the smallest (normal) double precision floating point number (~1e-308), it underflows and you get a zero. This would not be a bug in your code or in SciPy; it's just a fundamental limitation of floating point arithmetic.

If your sample size is large, it doesn't take much of a difference in sample means to get a zero p-value.

import numpy as np
from scipy import stats
rng = np.random.default_rng(83469358365936)
x = rng.random(1000)
stats.ttest_ind(x, x + 1)
# TtestResult(statistic=-76.66392731424226, pvalue=0.0, df=1998.0)

If you really want to know the true p-value, use arbitrary precision arithmetic and the definition of the t distribution CDF.

from mpmath import mp
t = mp.mpf(res.statistic)
nu = mp.mpf(res.df)
x2 = nu / (t**2 + nu)
p = mp.betainc(nu/2, mp.one/2, x2=x2, regularized=True)
print(p)
# 1.72157326887951e-597

Upvotes: 0

Related Questions