Reputation: 2698
Confidence interval of mean has the following analytical solution:
Assuming that my dataset is normally distributed, and I do not know the population standard deviation, I can use t-score to compute the CI of mean. So I did:
from scipy import stats
import numpy as np
arr = np.array([4, 4, 1, 6, 6, 8, 1, 2, 3, 2, 2, 3, 4, 7, 6, 8, 0, 2, 8, 6, 5])
alpha = 0.05 # significance level = 5%
df = len(arr) - 1 # degress of freedom = 20
t = stats.t.ppf(1 - alpha/2, df) # 95% confidence t-score = 2.086
s = np.std(arr, ddof=1) # sample standard deviation = 2.502
n = len(arr)
lower = np.mean(arr) - (t * s / np.sqrt(n))
upper = np.mean(arr) + (t * s / np.sqrt(n))
print((lower, upper))
>>> (3.0514065531195387, 5.329545827832843)
print(stats.t.interval(1 - alpha/2, df, loc=np.mean(arr), scale=s / np.sqrt(n)))
>>> (2.8672993716475763, 5.513653009304806)
And the interval I manually calculated using the equation does not agree with the scipy implementation of the CI. Where is this error coming from?
Upvotes: 1
Views: 1091
Reputation: 114831
Your signifance level is 0.05, so the confidence level is 0.95. Pass that value to stats.t.interval
. Don't divide by 2; the function does that for you:
In [62]: print(stats.t.interval(1 - alpha, df, loc=np.mean(arr), scale=s / np.sqrt(n)))
(3.0514065531195387, 5.329545827832843)
Upvotes: 3