Reputation: 11
I have binary data and I want to calculate the confidence interval for that, but why do I get the upper bound greater than 1? Here is my code
import math
def find_CI(a):
n = len(a)
p_hat = sum(a)/n
h = math.sqrt((p_hat * (1- p_hat) /n))
ub = p_hat + (1.96 * h)
lb = p_hat - (1.96 * h)
return lb, ub
When I pass a = [1,0,0,1,1], I get the result (0.17058551491594975, 1.0294144850840503)
I also tried the following code
import scipy.stats as st
def find_confidence_interval(a):
x = st.t.interval(alpha=0.95, df=len(a)-1,
loc=np.mean(a),
scale=st.sem(a))
return x
I got the result as (-0.08008738065825705, 1.280087380658257)
I am confused. Shouldn't the confidence interval be between 0 and 1?
Upvotes: 0
Views: 231
Reputation: 321
Using a t-statistic to calculate confidence intervals for binomial data is probably not a good idea because this means you are assuming your data comes from an approximately normal distribution.
See here for details on how to more appropriately deal with confidence intervals in binomial distributions. For example, you could use the Wilson interval if you don't have many data points. For your [1, 1, 1, 0, 0] example a Wilson 95% interval would give (0.23, 0.88)
Upvotes: 0