Reputation: 65
I have a data frame with many columns. The first column contains categories such as "System 1", "System 2", and the second column has numbers that represent the 0's and 1's. Please see below :
For example:
SYSTEM | Q1 | Q2 |
---|---|---|
S1 | 0 | 1 |
S1 | 1 | 0 |
S2 | 1 | 1 |
S2 | 0 | 0 |
S2 | 1 | 1 |
I have this code in R to run Bootstrap 95% CI for mean function to obtain mean from the data (with indexing).
Here is my code:
m <- 1e4
n <- 5
set.seed(42)
df2 <- data.frame(SYSTEM=rep(c('S1', 'S2'), each=n/2), matrix(sample(0:1, m*n, replace=TRUE), m, n))
names(df2)[-1] <- paste0('Q', 1:n)
set.seed(0)
library(boot)
#define function to calculate fitted regression coefficients
coef_function <- function(formula, data, indices) {
d <- data[indices,] #allows boot to select sample
fit <- lm(formula, data=d) #fit regression model
return(coef(fit)) #return coefficient estimates of model
}
#perform bootstrapping with 2000 replications
reps <- boot(data=df2, statistic=coef_function, R=2000, formula=Q1~Q2)
#view results of boostrapping
reps
#calculate adjusted bootstrap percentile (BCa) intervals
boot.ci(reps, type="bca", index=1) #intercept of model
boot.ci(reps, type="bca", index=2) #disp predictor variable
Result should be :
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = df2, statistic = coef_function, R = 2000, formula = Q1 ~
Q2)
Bootstrap Statistics :
original bias std. error
t1* 0.600 0.00082 0.074
t2* -0.073 -0.00182 0.099
> boot.ci(reps, type="bca", index=1) #intercept of model
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates
CALL :
boot.ci(boot.out = reps, type = "bca", index = 1)
Intervals :
Level BCa
95% ( 0.45, 0.74 )
Calculations and Intervals on Original Scale
> boot.ci(reps, type="bca", index=2) #disp predictor variable
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates
CALL :
boot.ci(boot.out = reps, type = "bca", index = 2)
Intervals :
Level BCa
95% (-0.26, 0.13 )
Calculations and Intervals on Original Scale
Here I'm only using Q1 and Q2. I also didn't use group by.
I don't know where if this possible to do for groups and columns at once. Thank you in advance.
Upvotes: 0
Views: 76
Reputation: 887118
If 'Q1' is the response variable, we may group by 'SYSTEM', then loop across
the columns 'Q2' to 'Q5', create the formula from the column name (cur_column()
) with 'Q1' in reformulate
and pass it on to boot
library(boot)
library(dplyr)
out <- df2 %>%
group_by(SYSTEM) %>%
summarise(across(Q2:Q5,
~ list(boot(cur_data(), statistic = coef_function, R = 2000,
formula = reformulate(cur_column(), response = 'Q1')))), .groups = 'drop')
-output
> out
# A tibble: 2 × 5
SYSTEM Q2 Q3 Q4 Q5
<chr> <list> <list> <list> <list>
1 S1 <boot> <boot> <boot> <boot>
2 S2 <boot> <boot> <boot> <boot>
If we extract the column, the output will be
> out$Q2
[[1]]
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = cur_data(), statistic = coef_function, R = 2000,
formula = reformulate(cur_column(), response = "Q1"))
Bootstrap Statistics :
original bias std. error
t1* 0.48025529 -0.0001032709 0.01019634
t2* 0.02355538 0.0003813531 0.01412119
[[2]]
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = cur_data(), statistic = coef_function, R = 2000,
formula = reformulate(cur_column(), response = "Q1"))
Bootstrap Statistics :
original bias std. error
t1* 0.49564873 -0.0002947112 0.009942382
t2* 0.01850984 0.0003610360 0.013914520
Upvotes: 1