Run linear model in a powerset of variables

I am trying to get a data frame with different variables and run a linear model for each combination of those variables.

A simple example is:

names <- c("Var1", "Var2", "Var3") 
vars <- ggm::powerset(names, sort = T, nonempty = T) 

The powerset function gives me all the combinations of the 3 variables -- a list with 7 elements, each element is of type character. (the actual code I am trying to run has 16 variables, that's why I don´t want to manually write each of the models).

What I would like to do now is run the models in each of these combinations of variables. For now, I have written the following code (it does not work, but might be a good start):

i <- 1
for (dep_var in vars){
  assign(paste0("modelo", i), lm(lapply(paste("Y", dep_var, sep = "~"), formula), 
  data = data))
  i <- i+1
}

Ultimately, I would like to create different models and select the best model from a combination of in sample AIC and BIC and out of sample MAE and RMSE.

Thank you very much! Any help is appreciated!

Upvotes: 2

Views: 216

Answers (1)

paoloeusebi
paoloeusebi

Reputation: 1086

A simple example working with mtcars data with mpg as dependent variable and the collection of AIC for model selection. I agree that there are lot of R packages for testing multiple models according to some criterion (bestglm, leaps, glmulti, MASS).

mtcars_exp <- mtcars %>%
  select(-mpg)

vars <- ggm::powerset(names(mtcars_exp), sort = T, nonempty = T) 

results <- data.frame(AIC_lm_fit =  vector("numeric"),
           model = vector("character"))

For loop for fitting linear models

for (j in 1:length(vars)){
  lm_fit <- lm(mpg ~ ., data = mtcars[,c("mpg", vars[[j]])])
  results <- rbind(results,
                   data.frame(AIC_lm_fit =  AIC(lm_fit),
                            model = toString(vars[[j]])))
}

Selection of the model with the minimum AIC

results[AIC_lm_fit==min(AIC_lm_fit),]
        AIC_lm_fit        model
105   154.1194 wt, qsec, am

Comparison with the results from MASS::stepAIC function

library(MASS)
lm_fit <- lm(mpg ~ ., data = mtcars)
lm_fit_2 <- stepAIC(lm_fit)
lm_fit_2$call

lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Upvotes: 3

Related Questions