Reputation: 61
So let's imagine I have an array of sample data which is normally distributed. What I want, is to compute the probability of another sample being less than -3 and provide a bootstrapped confidence interval for that probability. After doing some research, I found the bootstrapped
python library which I want to use to find the CI.
So I have:
import numpy as np
import bootstrapped.bootstrap as bs
import bootstrapped.stats_functions as bs_stats
mu, sigma = 2.5, 4 # mean and standard deviation
samples = np.random.normal(mu, sigma, 1000)
bs.bootstrap(samples, stat_func= ???)
What should I write for stat_func ? I tried writing a lambda function to compute the probability of -3, but it did not work. I know how to compute the probability of a sample being less than -3, it's simply the CI which I am having a hard time dealing with.
Upvotes: 3
Views: 936
Reputation: 8219
I followed the example of stat_functions.mean
from the bootstrapped
package. Below it is wrapped in a 'factory' so that you can specify the level at which you want to calculate the frequency (sadly you cannot pass it as an optional argument to functions that bootstrap()
is expecting). Basically prob_less_func_factory(level)
returns a function that calculates the proportion of your sample that is less than that level
. It can be used for matrices just like the example I followed.
def prob_less_func_factory(level = -3.0):
def prob_less_func(values, axis=1):
'''Returns the proportion of samples that are less than the 'level' of each row of a matrix'''
return np.mean(np.asmatrix(values)<level, axis=axis).A1
return prob_less_func
Now you pass it in like so
level = -3
bs_res = bs.bootstrap(samples, stat_func = prob_less_func_factory(level=level))
and the result I get (yours will be slightly different because samples
is random) is
0.088 (0.06999999999999999, 0.105)
so the boostrap
function estimated (well, calculated) the proportion of values in samples
that are less than -3
to be 0.088
and the confidence interval around it is (0.06999999999999999, 0.105)
For checking we can calculate the theoretical value of one sample from your distribution being less than -3
:
from scipy.stats import norm
print(f'Theoretical Prob(N(mean={mu},std={sigma})<{level}): {norm.cdf(level, loc=mu,scale =sigma)}')
and we get
Theoretical Prob(N(mean=2.5,std=4)<-3): 0.08456572235133569
so it all seems consistent consistent.
Upvotes: 1