Jinay Jani
Jinay Jani

Reputation: 29

Bootstrap - Confidence Interval Calculation

I am trying to implement bootstrap to estimate CI for statistics. Here is the code I have written

import numpy as np
import numpy.random as npr
import pylab

def bootstrap(data, num_samples, statistic, alpha):
   """Returns bootstrap estimate of 100.0*(1-alpha) CI for statistic."""
    num_samples = len(data)
    idx = npr.randint(min(data), max(data), num_samples)
    samples = data[idx]
    stat = np.sort(statistic(samples, 1))
    return (stat[int((alpha/2.0)*num_samples)],
    stat[int((1-alpha/2.0)*num_samples)])

X,Y = np.loadtxt('data/ABC.txt',
                          unpack =True,
                          delimiter =',',
                          skiprows = 1)

The text file contains 2 columns and I need to calculate the confidence interval for both columns. My first thought is to convert the columns into an array and calculate the high and low 95% CI. I was thinking of something like this:

data = np.array([X,Y])
low, high = bootstrap(X, len(data), np.mean, 0.05)
low1, high1 = bootstrap(Y, len(data), np.mean, 0.05)

But I am not sure if this the correct way of calculating confidence interval. Can someone help me with this?

Thank you in advance!

Upvotes: 1

Views: 1477

Answers (1)

Instead of :

idx = npr.randint(min(data), max(data), num_samples)

Use:

idx=np.random.choice(data,size=len(data),replace=True)

Upvotes: 2

Related Questions