Kris
Kris

Reputation: 61

Find bootstrapped confidence interval with bootstrapped library

So let's imagine I have an array of sample data which is normally distributed. What I want, is to compute the probability of another sample being less than -3 and provide a bootstrapped confidence interval for that probability. After doing some research, I found the bootstrapped python library which I want to use to find the CI.

So I have:

import numpy as np
import bootstrapped.bootstrap as bs
import bootstrapped.stats_functions as bs_stats
mu, sigma = 2.5, 4 # mean and standard deviation
samples = np.random.normal(mu, sigma, 1000)
bs.bootstrap(samples, stat_func= ???)

What should I write for stat_func ? I tried writing a lambda function to compute the probability of -3, but it did not work. I know how to compute the probability of a sample being less than -3, it's simply the CI which I am having a hard time dealing with.

Upvotes: 3

Views: 936

Answers (1)

piterbarg
piterbarg

Reputation: 8219

I followed the example of stat_functions.mean from the bootstrapped package. Below it is wrapped in a 'factory' so that you can specify the level at which you want to calculate the frequency (sadly you cannot pass it as an optional argument to functions that bootstrap() is expecting). Basically prob_less_func_factory(level) returns a function that calculates the proportion of your sample that is less than that level. It can be used for matrices just like the example I followed.

def prob_less_func_factory(level = -3.0):
    def prob_less_func(values, axis=1):
        '''Returns the proportion of samples that are less than the 'level' of each row of a matrix'''
        return np.mean(np.asmatrix(values)<level, axis=axis).A1
    return prob_less_func

Now you pass it in like so

level = -3
bs_res = bs.bootstrap(samples, stat_func = prob_less_func_factory(level=level))

and the result I get (yours will be slightly different because samples is random) is

0.088    (0.06999999999999999, 0.105)

so the boostrap function estimated (well, calculated) the proportion of values in samples that are less than -3 to be 0.088 and the confidence interval around it is (0.06999999999999999, 0.105)

For checking we can calculate the theoretical value of one sample from your distribution being less than -3:

from scipy.stats import norm
print(f'Theoretical Prob(N(mean={mu},std={sigma})<{level}): {norm.cdf(level, loc=mu,scale =sigma)}')

and we get

Theoretical Prob(N(mean=2.5,std=4)<-3): 0.08456572235133569

so it all seems consistent consistent.

Upvotes: 1

Related Questions