Reputation: 105
I am trying to conduct a hypothesis test, but I am getting a p-value of 0.0. I can understand if it is really small, but a 0.0 is leading me to believe that there is an error. Here is my code:
results = st.ttest_ind(
ultimate_users['monthly_profit'],
surf_users['monthly_profit'],
nan_policy = 'omit')
print('p-value: ', results.pvalue)
output:
p-value: 0.0
Upvotes: 0
Views: 963
Reputation: 3738
There isn't necessarily an error. If the p-value is smaller than the smallest (normal) double precision floating point number (~1e-308
), it underflows and you get a zero. This would not be a bug in your code or in SciPy; it's just a fundamental limitation of floating point arithmetic.
If your sample size is large, it doesn't take much of a difference in sample means to get a zero p-value.
import numpy as np
from scipy import stats
rng = np.random.default_rng(83469358365936)
x = rng.random(1000)
stats.ttest_ind(x, x + 1)
# TtestResult(statistic=-76.66392731424226, pvalue=0.0, df=1998.0)
If you really want to know the true p-value, use arbitrary precision arithmetic and the definition of the t distribution CDF.
from mpmath import mp
t = mp.mpf(res.statistic)
nu = mp.mpf(res.df)
x2 = nu / (t**2 + nu)
p = mp.betainc(nu/2, mp.one/2, x2=x2, regularized=True)
print(p)
# 1.72157326887951e-597
Upvotes: 0