Python: Implement mean of means 95% Confidence Interval?

Question

How can this solution be implemented using pandas/python? This question concerns the implementation of finding a 95% CI around a mean of means using this stats.stackexchange solution.

import pandas as pd
from IPython.display import display
import scipy
import scipy.stats as st 
import scikits.bootstrap as bootstraps

data = pd.DataFrame({
     "exp1":[34, 41, 39] 
    ,"exp2":[45, 51, 52]
    ,"exp3":[29, 31, 35]
}).T

data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)
display(data)

                  0       1       2       row_mean       row_std     
  
            exp1       34       41       39       38.000000       2.943920     
          exp2       45       51       52       49.333333       3.091206     
          exp3       29       31       35       31.666667       2.494438

mean_of_means = data.row_mean.mean()
std_of_means = data.row_mean.std()
confidence = 0.95
print("mean(means): {}
std(means):{}".format(mean_of_means,std_of_means))

mean(means): 39.66666666666667
std(means): 8.950481054731702

1st incorrect attempt (zscore):

zscore = st.norm.ppf(1-(1-confidence)/2)
lower_bound = mean_of_means - (zscore*std_of_means)
upper_bound = mean_of_means + (zscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))

95% CI = [22.1,57.2] (incorrect solution)

2nd incorrect attempt (tscore):

tscore = st.t.ppf(1-0.05, data.shape[0])
lower_bound = mean_of_means - (tscore*std_of_means)
upper_bound = mean_of_means + (tscore*std_of_means)
print("95% CI = [{},{}]".format(lower_bound,upper_bound))

95% CI = [18.60,60.73] (incorrect solution)

3rd incorrect attempt (boostrap):

CIs = bootstraps.ci(data=data.row_mean, statfunction=scipy.mean,alpha=0.05)

95% CI = [31.67, 49.33] (incorrect solution)

How can this solution be implemented using pandas/python to get the correct solution below?

95% CI = [17.4 to 61.9] (correct solution)

blehman · Accepted Answer

Thank you Jon Bates.

import pandas as pd
import scipy
import scipy.stats as st 

data = pd.DataFrame({
     "exp1":[34, 41, 39] 
    ,"exp2":[45, 51, 52]
    ,"exp3":[29, 31, 35]
}).T

data.loc[:,"row_mean"] = data.mean(axis=1)
data.loc[:,"row_std"] = data.std(axis=1)

tscore = st.t.ppf(1-0.025, data.shape[0]-1)

print("mean(means): {}
std(means): {}
tscore: {}".format(mean_of_means,std_of_means,tscore))

lower_bound = mean_of_means - (tscore*std_of_means/(data.shape[0]**0.5))
upper_bound = mean_of_means + (tscore*std_of_means/(data.shape[0]**0.5))

print("95% CI = [{},{}]".format(lower_bound,upper_bound))

mean(means): 39.66666666666667
std(means): 8.950481054731702
tscore: 4.302652729911275
95% CI = [17.432439139464606,61.90089419386874]

Python: Implement mean of means 95% Confidence Interval?

Answers (1)

Related Questions