Different results from two implementations of ttest in scipy.stats

Question

Here are two ways to do an independent t-test (welch version) in scipy. Both are giving different results for the calculated pvalue as well as the t-statistic itself? Why is this the case?

import scipy
print(scipy.__version__)
# 1.4.1
from scipy.stats import ttest_ind, ttest_ind_from_stats
x1_a = [19.0924, 19.1055, 19.1192, 19.1431, 19.0970]
x1_b = [20.3323, 20.3472, 20.3417, 20.3408, 20.2849]
x1_c = [19.0448, 18.9576, 19.0171, 19.0184, 18.9534]
ttest_ind(x1_a, x1_c, equal_var=False)
# Ttest_indResult(statistic=5.568858312509857, pvalue=0.0014998806395224108)
ttest_ind_from_stats(np.mean(x1_a), np.std(x1_a), 5, np.mean(x1_c), np.std(x1_c), 5, equal_var=False)
# Ttest_indResult(statistic=6.226172871918404, pvalue=0.000844418100098984)
ttest_ind(x1_a, x1_b, equal_var=False)
# Ttest_indResult(statistic=-83.49461195258749, pvalue=1.3516515130741807e-12)
ttest_ind_from_stats(np.mean(x1_a), np.std(x1_a), 5, np.mean(x1_b), np.std(x1_b), 5, equal_var=False)
# Ttest_indResult(statistic=-93.34981404047603, pvalue=5.764760941006529e-13)

The things I tried to rule out possible causes include checking a possible rounding off issue by feeding it np.sqrt(np.var(x)) instead of np.std(x)), writing a custom testing function using the wikipedia explanation which gives a result similar to ttest_ind_from_stats, trying multiple values, manually calculating the sds to avoid an n-1/n issue and trying to read the documentation for source code but it seems that ttest_ind uses _ttest_ind_from_stats internally which raises my confusion. Here is my custom function:

from scipy.stats import t as tdist
def welch_ttest(m1, m2, s1, s2, n1, n2):
    numerator = m1 - m2
    denominator = np.sqrt((s1 ** 2) / n1 + (s2 ** 2) / n2)
    t = numerator / denominator
    dof_numerator = ((s1 ** 2) / n1 + (s2 ** 2) / n2) ** 2
    dof_denominator = ((s1 ** 4) / (n1 ** 2) / (n1 - 1) + (s2 ** 4) / (n2 ** 2) / (n2 - 1))
    dof = dof_numerator / dof_denominator
    p_half = tdist.cdf(t, dof)
    if p_half > 0.5:
        p_final = 2 * (1 - p_half)
    else:
        p_final = 2 * p_half
    return t, p_final  # returning t to check the validity of the function

Marat · Accepted Answer

np.std doesn't perform Bessel's correction. If it is replaced with pandas' version of std, the results match:

ttest_ind(x1_a, x1_c, equal_var=False)                                                             
# Ttest_indResult(statistic=5.568858312509857, pvalue=0.0014998806395224108)

ttest_ind_from_stats(np.mean(x1_a), pd.Series(x1_a).std(), 5, np.mean(x1_c), pd.Series(x1_c).std(), 5, equal_var=False)                                                                               
# Ttest_indResult(statistic=5.568858312509857, pvalue=0.0014998806395224108)

Or, if you don't want an extra import, just apply the correction in place by multiplying std by sqrt(n/n-1)

Different results from two implementations of ttest in scipy.stats

Answers (1)

Related Questions