Reputation: 1330
Is it possible to do a t-test using scipy.stats.ttest_1samp where the input is a statistic rather than an array? For example, with difference in means you have two options: ttest_ind() and ttest_ind_from_stats().
import numpy as np
import scipy.stats as stats
from scipy.stats import norm
mean1=35.6
std1=11.3
nobs1=84
mean2=44.7
std2=8.9
nobs2=84
print(stats.ttest_ind_from_stats(mean1, std1, nobs1, mean2, std2, nobs2, equal_var=False))
# alternatively, you can pass 2 arrays
print(stats.ttest_ind(
stats.norm.rvs(loc=mean1, scale=std1, size=84),
stats.norm.rvs(loc=mean2, scale=std2, size=84),
equal_var=False)
)
Is there an equivalent function with a one-sample t-test? Thank you for your help.
Upvotes: 6
Views: 1998
Reputation: 1
The method provided here is very good, however the p-value provided might need little adjustment as the population size of 1 or 2 might distort the p-value. I have tried to rewrite the function with larger population size as below:
def tttest_1samp_from_stats(sample_mean, sample_std, sample_size, popmean):
POPSIZE = 100000000000000000
return stats.ttest_ind_from_stats(mean1=sample_mean,
std1=sample_std,
nobs1=sample_size,
mean2=popmean,
std2=0.0,
nobs2=POPSIZE,
equal_var=False)
and it works for my case.
Example here:
data = [15.8,16.1,16.3,16.25,16.6, 16.22,16.2,16.1,16.15,16.08,16.32,
16.5, 16.6, 16.7, 16.3]
print(f"sample size: {len(data)}, sample mean: {sum(data)/len(data)}, standard deviation: {np.std(data, ddof=1)}")
#perform one sample t-test result = stats.ttest_1samp(a=data, popmean=16.1) print(result)
#Result:
sample size: 15, sample mean: 16.281333333333333, standard deviation:
0.23829353647090304
TtestResult(statistic=2.9472095236541365, pvalue=0.010604297674644585, df=14)
#########
mean = np.mean(data)
sd = np.std(data, ddof=1)
print(sd)
n = len(data)
print(tttest_1samp_from_stats(mean, sd, n, 16.1))
#Result:
Ttest_indResult(statistic=2.9472095236541365, pvalue=0.010604297674644585)
Upvotes: 0
Reputation: 23647
There is no such function for the one sample test, but you can use the two sample function. In short, to perform a one sample t-test do this:
sp.stats.ttest_ind_from_stats(mean1=sample_mean,
std1=sample_std,
nobs1=n_samples,
mean2=population_mean,
std2=0,
nobs2=2,
equal_var=False)
Note that the result is completely independent from nobs2
(as it should be, since there is no n2 in the one sample test). Just make sure to pass in a value >1 to avoid a division by zero.
Check out the Wikipedia page about the different types of t-test.
The one sample t-test uses the statistic
with n - 1 degrees of freedom.
The ttest_ind_from_stats
function can do Welch's t-test (unequal sample size, unequal variance), which is defined as
and degrees of freedom:
We can transform the definition of Welch's t-test to the one sample t-test. If we set mean2
to the population mean and std2
to 0 the equations for the t-statistic are the same, and the degrees of freedom reduces to n - 1.
Upvotes: 5