Tianjian Qin
Tianjian Qin

Reputation: 631

How to build and test multiple models in R

Suppose I have a dataset like (forget about the distribution):

modData <- data.frame("A" = rnorm(20, 15, 3),
                      "B" = rnorm(20, 20, 3),
                      "C" = rnorm(20, 25, 3),
                      "X" = rnorm(20, 5, 1)
                      )

If I use X as a predictor, A, Band C as responses, respectively:

md1 <- lm(A ~ X, data = modData)
md2 <- lm(B ~ X, data = modData)
md3 <- lm(C ~ X, data = modData)

Then do a Shapiro test and a boxcox test to every model, e.g. :

shapiro.test(residuals(md1))
boxcox(md1, plotit = T)

Is there a convenient way to build and test multiple models without manually typing each of them?

Upvotes: 2

Views: 1372

Answers (4)

Seyma Kalay
Seyma Kalay

Reputation: 2859

data("mtcars")

formulas <- list(
  mpg ~ disp,
  mpg ~ disp + wt
)

res <- vector("list", length = length(formulas))

for(i in seq_along(formulas)){
  res[[i]] <- lm(formulas[[i]], data = mtcars)}
res

lapply(formulas, lm, data = mtcars)

Upvotes: 1

Roman Luštrik
Roman Luštrik

Reputation: 70633

In case you wouldn't want to introduce dozens of dependencies, you can do it with a simple sapply. Notice that I don't provide a boxcox part because I don't know where it comes from (car, MASS?).

modData <- data.frame("A" = rnorm(20, 15, 3),
                      "B" = rnorm(20, 20, 3),
                      "C" = rnorm(20, 25, 3),
                      "X" = rnorm(20, 5, 1))

deps <- c("A", "B", "C")
indeps <- c("X")

result <- sapply(deps, FUN = function(x, indeps, mydata) {
  myformula <- formula(sprintf("%s ~ %s", x, indeps))

  model <- lm(myformula, data = mydata)
  out.shapiro <- shapiro.test(residuals(model))

  return(list(model = model, shapiro = out.shapiro))
}, indeps = indeps, mydata = modData, simplify = FALSE)

Upvotes: 2

AntoniosK
AntoniosK

Reputation: 16121

Here's an alternative approach using tidyverse:

modData <- data.frame("A" = rnorm(20, 15, 3),
                      "B" = rnorm(20, 20, 3),
                      "C" = rnorm(20, 25, 3),
                      "X" = rnorm(20, 5, 1))
library(tidyverse)
library(broom)

# specify predictor and target variables
x = "X"
y = names(modData)[names(modData)!= x]

expand.grid(y,x) %>%                                    # create combinations
  mutate(model_id = row_number(),                       # create model id
         frml = paste0(Var1, "~", Var2)) %>%            # create model formula
  group_by(model_id, Var1, Var2) %>%                    # group by the above
  nest() %>%                                            # nest data
  mutate(m = map(data, ~lm(.$frml, data = modData)),    # create models
         m_table = map(m, ~tidy(.)),                    # tidy model output
         st = map(m, ~shapiro.test(residuals(.)))) -> dt_model_info  # shapiro test

# access model info
dt_model_info
dt_model_info$m
dt_model_info$m_table
dt_model_info$st

# another way to access info
dt_model_info %>% unnest(m_table)

Upvotes: 5

Andrii
Andrii

Reputation: 3043

Here is an approach by using simple "lapply":

# 1. Data set
df <- data.frame(
  a = rnorm(20, 15, 3),
  b = rnorm(20, 20, 3),
  c = rnorm(20, 25, 3),
  x = rnorm(20, 5, 1))

# 2. Models
fit_lm_a <- lm(a ~ x, df)
fit_lm_b <- lm(b ~ x, df)
fit_lm_c <- lm(c ~ x, df)

# 3. List of models
list_fit_lm <- list(fit_lm_a, fit_lm_b,fit_lm_c)

# 3. Shapiro test
lapply(
  list_fit_lm, function(x) {
    shapiro.test(residuals(x)) 
  })

 # 4. Box-Cox Transformations
 lapply(
  list_fit_lm, function(x) {
    boxcox(x, plotit = TRUE, data = df)
  }
 )

Upvotes: 3

Related Questions