Andriana
Andriana

Reputation: 387

How to calculate 95% confidence intervals using Bootstrap method

I'm trying to calculate the confidence interval for the mean value using the method of bootstrap in python. Let say I have a vector a with 100 entries and my aim is to calculate the mean value of these 100 values and its 95% confidence interval using bootstrap. So far I have manage to resample 1000 times from my vector using the np.random.choice function. Then for each bootstrap vector with 100 entries I calculated the mean. So now I have 1000 bootstrap mean values and a single sample mean value from my initial vector but I'm not sure how to proceed from here. How could I use these mean values to find the confidence interval for the mean value of my initial vector? I'm relatively new in python and it's the first time I came across with the method of bootstrap so any help would be much appreciated.

Upvotes: 11

Views: 19456

Answers (3)

Bogdan Lalu
Bogdan Lalu

Reputation: 57

I have a simple statistical solution : Confidence intervals are based on the standard error. The standard error in your case is the standard deviation of your 1000 bootstrap means. Assuming a normal distribution of the sampling distribution of your parameter(mean), which should be warranted by the properties of the Central Limit Theorem, just multiply the equivalent z-score of the desired confidence interval with the standard deviation. Therefore:

lower boundary = mean of your bootstrap means - 1.96 * std. dev. of your bootstrap means

upper boundary = mean of your bootstrap means + 1.96 * std. dev. of your bootstrap means

95% of cases in a normal distribution sit within 1.96 standard deviations from the mean

hope this helps

Upvotes: 4

Idan Azuri
Idan Azuri

Reputation: 721

First I suggest you to deeper your understanding regarding the bootstrapping method and it usage, the main idea is to handle a situation of a lack in a data and you want reproduce more of it.

Second, regarding the confidence interval you can use the Wilson Score Interval which aims to help you rank binomial models. I found this Ipython notebook which explains what you asked for

A short example of wilson interval

import math


def ci(positive, n, z):
    # z = 1.96
    phat = positive / n

    return (phat + z * z / (2 * n) - z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n), \
           (phat + z * z / (2 * n) + z * math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)


sample_size = [50, 100, 200, 400, 8000]
z_rate_confidence = {'95%': 1.96, '90%': 1.92, '75%': 1.02}
success_rate = [0.6, 0.7, 0.8]
for confidence, z in z_rate_confidence.iteritems():
    print 'confidence: '+confidence + '\n'
    for n in sample_size:
        print 'sample size: ',n
        for s in success_rate:
            print ci(s * n, n, z)

Upvotes: -1

Horia Coman
Horia Coman

Reputation: 8781

You could sort the array of 1000 means and use the 50th and 950th elements as the 90% bootstrap confidence interval.

Your set of 1000 means is basically a sample of the distribution of the mean estimator (the sampling distribution of the mean). So, any operation you could do on a sample from a distribution you can do here.

Upvotes: 9

Related Questions