Dia
Dia

Reputation: 11

Why is confidence interval greater than 1 for binary data?

I have binary data and I want to calculate the confidence interval for that, but why do I get the upper bound greater than 1? Here is my code

import math
def find_CI(a):
    n = len(a)
    p_hat = sum(a)/n
    h = math.sqrt((p_hat * (1- p_hat) /n))
    ub = p_hat + (1.96 * h)
    lb = p_hat - (1.96 * h)
    return lb, ub

When I pass a = [1,0,0,1,1], I get the result (0.17058551491594975, 1.0294144850840503)

I also tried the following code

import scipy.stats as st
def find_confidence_interval(a):
    x = st.t.interval(alpha=0.95, df=len(a)-1,
              loc=np.mean(a),
              scale=st.sem(a))
    return x

I got the result as (-0.08008738065825705, 1.280087380658257)

I am confused. Shouldn't the confidence interval be between 0 and 1?

Upvotes: 0

Views: 231

Answers (1)

Jon Strutz
Jon Strutz

Reputation: 321

Using a t-statistic to calculate confidence intervals for binomial data is probably not a good idea because this means you are assuming your data comes from an approximately normal distribution.

See here for details on how to more appropriately deal with confidence intervals in binomial distributions. For example, you could use the Wilson interval if you don't have many data points. For your [1, 1, 1, 0, 0] example a Wilson 95% interval would give (0.23, 0.88)

Upvotes: 0

Related Questions