cat cat
cat cat

Reputation: 65

R: How to modify my code to group by then loop over all columns at once

I have a data frame with many columns. The first column contains categories such as "System 1", "System 2", and the second column has numbers that represent the 0's and 1's. Please see below :

For example:

SYSTEM Q1 Q2
S1 0 1
S1 1 0
S2 1 1
S2 0 0
S2 1 1

I have this code in R to run Bootstrap 95% CI for mean function to obtain mean from the data (with indexing).

Here is my code:

m <- 1e4
n <- 5
set.seed(42)
df2 <- data.frame(SYSTEM=rep(c('S1', 'S2'), each=n/2), matrix(sample(0:1, m*n, replace=TRUE), m, n))
names(df2)[-1] <- paste0('Q', 1:n)



set.seed(0)
library(boot)


#define function to calculate fitted regression coefficients
coef_function <- function(formula, data, indices) {
  d <- data[indices,] #allows boot to select sample
  fit <- lm(formula, data=d) #fit regression model
  return(coef(fit)) #return coefficient estimates of model
}

#perform bootstrapping with 2000 replications
reps <- boot(data=df2, statistic=coef_function, R=2000, formula=Q1~Q2)

#view results of boostrapping
reps

#calculate adjusted bootstrap percentile (BCa) intervals
boot.ci(reps, type="bca", index=1) #intercept of model
boot.ci(reps, type="bca", index=2) #disp predictor variable

Result should be :

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = df2, statistic = coef_function, R = 2000, formula = Q1 ~ 
    Q2)


Bootstrap Statistics :
    original   bias    std. error
t1*    0.600  0.00082       0.074
t2*   -0.073 -0.00182       0.099
> boot.ci(reps, type="bca", index=1) #intercept of model
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates

CALL : 
boot.ci(boot.out = reps, type = "bca", index = 1)

Intervals : 
Level       BCa          
95%   ( 0.45,  0.74 )  
Calculations and Intervals on Original Scale
> boot.ci(reps, type="bca", index=2) #disp predictor variable
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates

CALL : 
boot.ci(boot.out = reps, type = "bca", index = 2)

Intervals : 
Level       BCa          
95%   (-0.26,  0.13 )  
Calculations and Intervals on Original Scale

Here I'm only using Q1 and Q2. I also didn't use group by.

I don't know where if this possible to do for groups and columns at once. Thank you in advance.

Upvotes: 0

Views: 76

Answers (1)

akrun
akrun

Reputation: 887118

If 'Q1' is the response variable, we may group by 'SYSTEM', then loop across the columns 'Q2' to 'Q5', create the formula from the column name (cur_column()) with 'Q1' in reformulate and pass it on to boot

library(boot)
library(dplyr)
out <- df2 %>% 
    group_by(SYSTEM) %>%
    summarise(across(Q2:Q5, 
   ~ list(boot(cur_data(), statistic = coef_function, R = 2000,
      formula = reformulate(cur_column(), response = 'Q1')))), .groups = 'drop')

-output

> out
# A tibble: 2 × 5
  SYSTEM Q2     Q3     Q4     Q5    
  <chr>  <list> <list> <list> <list>
1 S1     <boot> <boot> <boot> <boot>
2 S2     <boot> <boot> <boot> <boot>

If we extract the column, the output will be

> out$Q2
[[1]]

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = cur_data(), statistic = coef_function, R = 2000, 
    formula = reformulate(cur_column(), response = "Q1"))


Bootstrap Statistics :
      original        bias    std. error
t1* 0.48025529 -0.0001032709  0.01019634
t2* 0.02355538  0.0003813531  0.01412119

[[2]]

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = cur_data(), statistic = coef_function, R = 2000, 
    formula = reformulate(cur_column(), response = "Q1"))


Bootstrap Statistics :
      original        bias    std. error
t1* 0.49564873 -0.0002947112 0.009942382
t2* 0.01850984  0.0003610360 0.013914520

Upvotes: 1

Related Questions