Reputation: 198
I took two samples, each one consisting of 100K observations from the same standard normal distribution and tested the null hypothesis that their means are identical. I repeated this experiment 5K times and plotted the p-values in a histogram.
From my intuition, these two samples are sufficiently large and were sampled from the same distribution (same mean and std). Hence, I'd expect the t-tests to result in relatively high p-values (reject the null hypothesis). However, the p-values seem to be uniformly distributed.
Here's the code I used to create this plot (I am using numpy 1.19.2, scipy 1.4.1):
from scipy import stats
import numpy as np
ps = []
for i in range(5000):
gaussian_numbers = np.random.normal(0, 1, size=100000)
gaussian_numbers2 = np.random.normal(0, 1, size=100000)
t, p = stats.ttest_ind(gaussian_numbers, gaussian_numbers2, equal_var=True)
ps.append(p)
plt.hist(ps, 100)
As you can see, I get a more or less uniform distributions of p-values throughout the whole value-range [0, 1].
Can someone tell me the flaw in my thinking? Can you replicate this?
Upvotes: 2
Views: 684
Reputation: 6333
You are taking two random samples from the same distribution and calculating the t-statistic to test the null hypothesis that the means are identical.
There is no reason that the p-values should be distributed closer to 1 because the samples are random. To understand this, think of confidence intervals.
A confidence interval tells you that (1 - alpha) * 100 percent of the time, the true parameter will lie within the observed interval. Likewise, your p-values lie within 0 and 0.05 approximately 5% of the time.
In other words:
# Convert `ps` to numpy array
ps = np.array(ps)
# Check how many times you rejected H0
print('We rejected H0', (ps <= 0.05).sum(), 'times out of', len(ps))
print('We did not reject H0', (ps > 0.05).sum(), 'times out of', len(ps))
Which returns:
We rejected H0 246 times out of 5000
We did not reject H0 4754 times out of 5000
Upvotes: 0
Reputation: 114946
So i'd expect, that the t-test results in relatively high p-values, or a tendency to high p-values.
Your expectation is not correct. Your inputs satisfy the "null hypothesis" of the t-test: they are drawn from populations with the same mean. In general, when performing a hypothesis test such as the t-test and the input(s) satisfy the null hypothesis, the distribution of the p-value is uniform on the interval [0, 1]. So your plot is the expected result of your repeated tests.
Upvotes: 3