Isobel M
Isobel M

Reputation: 55

Looping over different variables and datasets in R regression

I can run the same regression using different data frames using a pipe and loop (see below):

 mtcars %>%
        split(.$cyl) %>%
        map(~ lm(mpg ~ wt, data = .x))

However, what if I am interested seeing results for the same regression but for a range of different dependent variables - e.g. "mpg", "hp", "drat". Is there a fast way to do this using loops?

I have tried using nested lapply loops, group_by etc. however, I can't seem to find a solution.

Any help would be great.

Upvotes: 0

Views: 562

Answers (4)

akrun
akrun

Reputation: 887951

We can use nest_by

library(dplyr)
mtcars %>% 
   nest_by(cyl) %>% 
   mutate(model = list(lm(mpg~ ., data = data)))
# A tibble: 3 x 3
# Rowwise:  cyl
#    cyl                data model 
#  <dbl> <list<tbl_df[,10]>> <list>
#1     4           [11 × 10] <lm>  
#2     6            [7 × 10] <lm>  
#3     8           [14 × 10] <lm>  

Upvotes: 0

codez0mb1e
codez0mb1e

Reputation: 906

This might look like:

library(dplyr)


label <- "mpg"
features <- setdiff(names(mtcars), label)
generate_formula <- function(feature) sprintf("%s ~ %s", label, feature) %>% as.formula

features %>% 
  map(~ lm(generate_formula(.x), data = mtcars))

Output:

[[1]]

Call:
lm(formula = generate_formula(.x), data = mtcars)

Coefficients:
(Intercept)          cyl  
     37.885       -2.876  


[[2]]

Call:
lm(formula = generate_formula(.x), data = mtcars)

Coefficients:
(Intercept)         disp  
   29.59985     -0.04122  


[[3]]

Call:
lm(formula = generate_formula(.x), data = mtcars)

Coefficients:
(Intercept)           hp  
   30.09886     -0.06823

Upvotes: 0

r.user.05apr
r.user.05apr

Reputation: 5456

The tidy loop could be:

library(tidyverse)

mtcars %>%
  select(-mpg) %>%
  map(~lm(mtcars$mpg ~ .x), data = mtcars)

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 174586

It sounds like you wish to loop through the column names of each data frame. Effectively you need a double map or double lapply. Something like this would work:

mtcars %>%
        split(.$cyl) %>%
        lapply(function(x)
        {
          lapply(paste("mpg ~", names(x)[-1]), function(y) {
            lm(formula = as.formula(y), data = x)
        })})

#> $`4`
#> $`4`[[1]]
#> 
#> Call:
#> lm(formula = as.formula(y), data = x)
#> 
#> Coefficients:
#> (Intercept)          cyl  
#>       26.66           NA  
#> 
#> 
#> $`4`[[2]]
#> 
#> Call:
#> lm(formula = as.formula(y), data = x)
#> 
#> Coefficients:
#> (Intercept)         disp  
#>     40.8720      -0.1351  

# ... etc (very long list)

Upvotes: 1

Related Questions