khuesmann
khuesmann

Reputation: 198

Unexpected distribution of p-values from t-test

I took two samples, each one consisting of 100K observations from the same standard normal distribution and tested the null hypothesis that their means are identical. I repeated this experiment 5K times and plotted the p-values in a histogram.

From my intuition, these two samples are sufficiently large and were sampled from the same distribution (same mean and std). Hence, I'd expect the t-tests to result in relatively high p-values (reject the null hypothesis). However, the p-values seem to be uniformly distributed.

Histogram of p-values for 5000 t-tests on two randomly generated normal distributions with n=100000, mu=0, sig=1

Here's the code I used to create this plot (I am using numpy 1.19.2, scipy 1.4.1):

from scipy import stats
import numpy as np

ps = []
for i in range(5000):
    gaussian_numbers = np.random.normal(0, 1, size=100000)
    gaussian_numbers2 = np.random.normal(0, 1, size=100000)
    t, p = stats.ttest_ind(gaussian_numbers, gaussian_numbers2, equal_var=True)
    ps.append(p)
plt.hist(ps, 100)

As you can see, I get a more or less uniform distributions of p-values throughout the whole value-range [0, 1].

Can someone tell me the flaw in my thinking? Can you replicate this?

Upvotes: 2

Views: 684

Answers (2)

Arturo Sbr
Arturo Sbr

Reputation: 6333

You are taking two random samples from the same distribution and calculating the t-statistic to test the null hypothesis that the means are identical.

There is no reason that the p-values should be distributed closer to 1 because the samples are random. To understand this, think of confidence intervals.

A confidence interval tells you that (1 - alpha) * 100 percent of the time, the true parameter will lie within the observed interval. Likewise, your p-values lie within 0 and 0.05 approximately 5% of the time.

In other words:

# Convert `ps` to numpy array
ps = np.array(ps)
# Check how many times you rejected H0
print('We rejected H0', (ps <= 0.05).sum(), 'times out of', len(ps))
print('We did not reject H0', (ps > 0.05).sum(), 'times out of', len(ps))

Which returns:

We rejected H0 246 times out of 5000

We did not reject H0 4754 times out of 5000

Upvotes: 0

Warren Weckesser
Warren Weckesser

Reputation: 114946

So i'd expect, that the t-test results in relatively high p-values, or a tendency to high p-values.

Your expectation is not correct. Your inputs satisfy the "null hypothesis" of the t-test: they are drawn from populations with the same mean. In general, when performing a hypothesis test such as the t-test and the input(s) satisfy the null hypothesis, the distribution of the p-value is uniform on the interval [0, 1]. So your plot is the expected result of your repeated tests.

Upvotes: 3

Related Questions