Rank and choose the 5 models that showed the best result for each group

Question

This is an example data.

ind1 <- rnorm(99)
ind2 <- rnorm(99)
ind3 <- rnorm(99)
ind4 <- rnorm(99)
ind5 <- rnorm(99)
dep <- rnorm(99, mean=ind1)
group <- rep(c("A", "B", "C"), each=33)
df <- data.frame(dep,group, ind1, ind2, ind3, ind4, ind5)

Here simple linear regression model has been fitted on every combination of variables in df after grouped by categorical variable. The result is satisfied. But my original data has much more than 5 variables. It is hard to see and compare the results in this list. So I would like to choose the best 5 models for each group from the resulting list (tibble_list) based on AIC value. It will be highly appreciated if someone could help me to do so.

indvar_list <- lapply(1:5, function(x) 
  combn(paste0("ind", 1:5), x, , simplify = FALSE))

formulas_list <- rapply(indvar_list, function(x)
  as.formula(paste("dep ~", paste(x, collapse="+"))))

run_model <- function(f) {    
  df %>% 
    nest(-group) %>% 
    mutate(fit = map(data, ~ lm(f, data = .)),
           results1 = map(fit, glance),
           results2 = map(fit, tidy)) %>% 
    unnest(results1) %>% 
    unnest(results2) %>% 
    select(group, term, estimate, r.squared, p.value, AIC) %>% 
    mutate(estimate = exp(estimate))
}

tibble_list <- lapply(formulas_list, run_model)
tibble_list

akrun · Accepted Answer

An option would be to bind the rows into a single dataset with a .id column, then arrange by 'group', 'AIC', grouped by 'group', filter the rows having the first five unique 'index'

library(tidyverse)
bind_rows(tibble_list, .id = 'index') %>% 
    arrange(group, AIC) %>% 
    group_by(group) %>% 
    filter(index %in% head(unique(index), 5)) 
# A tibble: 51 x 7
# Groups:   group [3]
#   index group term        estimate r.squared  p.value   AIC
#                         
# 1 1     A     (Intercept)    0.897     0.319 0.000620  79.5
# 2 1     A     ind1           2.07      0.319 0.000620  79.5
# 3 7     A     (Intercept)    0.883     0.358 0.00129   79.5
# 4 7     A     ind1           2.14      0.358 0.00129   79.5
# 5 7     A     ind3           0.849     0.358 0.00129   79.5
# 6 8     A     (Intercept)    0.890     0.351 0.00153   79.9
# 7 8     A     ind1           2.12      0.351 0.00153   79.9
# 8 8     A     ind4           0.860     0.351 0.00153   79.9
# 9 19    A     (Intercept)    0.877     0.387 0.00237   80.0
#10 19    A     ind1           2.18      0.387 0.00237   80.0
## … with 41 more rows

Rank and choose the 5 models that showed the best result for each group

Answers (1)

Related Questions