Reputation: 3337
I want to perform a two-sample, one-tailed t-test to compare two means. For the specific problem I am looking, I want the comparison to only be in one direction. I would like the null hypothesis to be that mu_2 > mu_1
and the alternative hypothesis to be mu_1 <= mu_2
. Or should the null hypothesis still be that mu_1 - mu_2 = 0
, even for the one-tailed case?
I am working with a large dataset, but if I were to extract and round the parameters, for data_1 it is mu_1 = 4.3, s_1 = 4.8, and n_1 = 40000
and data_2 it is mu_2 = 4.9, s_2 = 4.4, n_2 = 30000
. I am using scipy to perform a two-sample t-test:
stats.ttest_ind(data1,
data2,
equal_var = False)
Given that scipy only takes into account a two-tail test, I am not sure how to interpret the values. Ttest_indResult(statistic=-19.51646312898464, pvalue=1.3452106729078845e-84)
. The alpha value is 0.05, and the p-value is much much smaller than that which would mean the null hypothesis is rejected. However, my intuition tells me that the null hypothesis should not be rejected, because mu_2 is clearly larger than mu_1 (at the very minimum I would expect the p-value to be larger). Therefore, I feel like I'm either interpreting the results incorrectly or need to additional calculations to get the correct answer.
I would appreciate any additional help and guidance. Thanks!
Upvotes: 1
Views: 8020
Reputation: 363
SciPy >= 1.6
You can now do a two sample one tail test by using the "alternative" parameter per the documentation. In the below example I am using "less", but these are the options alternative{‘two-sided’, ‘less’, ‘greater’}
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html
from scipy.stats import ttest_ind
ttest, pval = ttest_ind(data1, data2, alternative="less")
print("t-test", '{0:.10f}'.format(ttest[0]))
print("p-value", '{0:.10f}'.format(pval[0]))
if pval <0.05:
print("we reject null hypothesis")
else:
print("we accept null hypothesis")
Upvotes: 2
Reputation: 46898
You are correct, if you are doing a one sided test, it should have a large p-value. ttest_ind
performs a two sided test, which gives the probability that you observe something more extreme than the absolute of your t-statistic.
To do a one sided t test, you can use the cdf, which is the sum of probabilities up to your t statistic.
Modifying this code slightly:
def welch_ttest(x1, x2,alternative):
n1 = x1.size
n2 = x2.size
m1 = np.mean(x1)
m2 = np.mean(x2)
v1 = np.var(x1, ddof=1)
v2 = np.var(x2, ddof=1)
tstat = (m1 - m2) / np.sqrt(v1 / n1 + v2 / n2)
df = (v1 / n1 + v2 / n2)**2 / (v1**2 / (n1**2 * (n1 - 1)) + v2**2 / (n2**2 * (n2 - 1)))
if alternative == "equal":
p = 2 * t.cdf(-abs(tstat), df)
if alternative == "lesser":
p = t.cdf(tstat, df)
if alternative == "greater":
p = 1-t.cdf(tstat, df)
return tstat, df, p
I simulate some data:
import numpy as np
from scipy.stats import ttest_ind
from scipy.stats import t
np.random.seed(seed=123)
data1 = np.random.normal(4.3,4.8,size=40000)
np.random.seed(seed=123)
data2 = np.random.normal(4.9,4.4,size=30000)
ndf = len(data1) +len(data2) - 2
ttest_ind(data1,data2,equal_var = False)
Ttest_indResult(statistic=-16.945279258324227, pvalue=2.8364816571790452e-64)
You get something like your result, we can test the code above for alternative == "equal" which is a two-sided test:
welch_ttest(data1,data2,"equal")
(<scipy.stats._continuous_distns.t_gen at 0x12472b128>,
67287.08544468222,
2.8364816571790452e-64)
You can the same p-value as scipy 2 sided t-test, now we do the one sided test you need:
welch_ttest(data1,data2,"greater")
(<scipy.stats._continuous_distns.t_gen at 0x12472b128>, 67287.08544468222, 1.0)
Upvotes: 1
Reputation: 696
I provided another solution for t-test p-value calculation.
from scipy.stats import ttest_ind
def t_test(x,y,alternative='both-sided'):
_, double_p = ttest_ind(x,y,equal_var = False)
if alternative == 'both-sided':
pval = double_p
elif alternative == 'greater':
if np.mean(x) > np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
elif alternative == 'less':
if np.mean(x) < np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
return pval
Upvotes: 3