user2822693
user2822693

Reputation: 1330

Scipy t-test one sample from statistics?

Is it possible to do a t-test using scipy.stats.ttest_1samp where the input is a statistic rather than an array? For example, with difference in means you have two options: ttest_ind() and ttest_ind_from_stats().

import numpy as np
import scipy.stats as stats
from scipy.stats import norm

mean1=35.6
std1=11.3
nobs1=84
mean2=44.7
std2=8.9
nobs2=84
print(stats.ttest_ind_from_stats(mean1, std1, nobs1, mean2, std2, nobs2, equal_var=False))
# alternatively, you can pass 2 arrays
print(stats.ttest_ind(
stats.norm.rvs(loc=mean1, scale=std1, size=84), 
stats.norm.rvs(loc=mean2, scale=std2, size=84),
equal_var=False)
 )

Is there an equivalent function with a one-sample t-test? Thank you for your help.

Upvotes: 6

Views: 1998

Answers (2)

Fan Ng
Fan Ng

Reputation: 1

The method provided here is very good, however the p-value provided might need little adjustment as the population size of 1 or 2 might distort the p-value. I have tried to rewrite the function with larger population size as below:

def tttest_1samp_from_stats(sample_mean, sample_std, sample_size, popmean):
    POPSIZE = 100000000000000000
    return stats.ttest_ind_from_stats(mean1=sample_mean,
                                      std1=sample_std,
                                      nobs1=sample_size,
                                      mean2=popmean,
                                      std2=0.0,
                                      nobs2=POPSIZE,
                                      equal_var=False)

and it works for my case.

Example here:

data = [15.8,16.1,16.3,16.25,16.6, 16.22,16.2,16.1,16.15,16.08,16.32,
16.5, 16.6, 16.7, 16.3]

print(f"sample size: {len(data)}, sample mean: {sum(data)/len(data)}, standard deviation: {np.std(data, ddof=1)}")
#perform one sample t-test result = stats.ttest_1samp(a=data, popmean=16.1) print(result) 

#Result:
sample size: 15, sample mean: 16.281333333333333, standard deviation:
0.23829353647090304 
TtestResult(statistic=2.9472095236541365, pvalue=0.010604297674644585, df=14)

#########

mean = np.mean(data)
sd = np.std(data, ddof=1)
print(sd)
n = len(data)
print(tttest_1samp_from_stats(mean, sd, n, 16.1))

#Result:
Ttest_indResult(statistic=2.9472095236541365, pvalue=0.010604297674644585)

Upvotes: 0

MB-F
MB-F

Reputation: 23647

TL;DR

There is no such function for the one sample test, but you can use the two sample function. In short, to perform a one sample t-test do this:

sp.stats.ttest_ind_from_stats(mean1=sample_mean, 
                              std1=sample_std, 
                              nobs1=n_samples, 
                              mean2=population_mean, 
                              std2=0, 
                              nobs2=2, 
                              equal_var=False)

Note that the result is completely independent from nobs2 (as it should be, since there is no n2 in the one sample test). Just make sure to pass in a value >1 to avoid a division by zero.


How does it work?

Check out the Wikipedia page about the different types of t-test.

The one sample t-test uses the statistic

enter image description here

with n - 1 degrees of freedom.

The ttest_ind_from_stats function can do Welch's t-test (unequal sample size, unequal variance), which is defined as

enter image description here with enter image description here

and degrees of freedom:

enter image description here

We can transform the definition of Welch's t-test to the one sample t-test. If we set mean2 to the population mean and std2 to 0 the equations for the t-statistic are the same, and the degrees of freedom reduces to n - 1.

Upvotes: 5

Related Questions