Daniel Valencia C.
Daniel Valencia C.

Reputation: 2279

How to use the dplyr package to do group-separated linear regressions in R?

I have a dataset of x and y separated by categories (a and b). I want to do 2 linear regressions, one for category a data and one for category b data. For this purpose, I used the dplyr package following this answer. I'm a little confused because my code is simpler, but I'm not able to do the regressions. Any tips?

library(dplyr)

Factor <- c("a", "b")
x <- seq(0,3,1)

df <- expand.grid(x = x, Factor = Factor)

df$y <- rnorm(8)

df %>%
  group_by(Factor) %>%
  do(lm(formula = y ~ x,
        data = .))

Error: Results 1, 2 must be data frames, not lm

Upvotes: 0

Views: 89

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269371

This creates a list column whose components are lm objects

df2 <- df %>%
  group_by(Factor) %>%
  summarize(lm = list(lm(formula = y ~ x, data = cur_data())), .groups = "drop")

giving:

> df2
# A tibble: 2 x 2
  Factor lm    
  <fct>  <list>
1 a      <lm>  
2 b      <lm>  

> with(df2, setNames(lm, Factor))
$a

Call:
lm(formula = y ~ x, data = cur_data())

Coefficients:
(Intercept)            x  
    -0.3906       0.2947  


$b

Call:
lm(formula = y ~ x, data = cur_data())

Coefficients:
(Intercept)            x  
     0.2684      -0.3403  

Upvotes: 1

ktiu
ktiu

Reputation: 2626

Here is my approach:

df %>%
  split(~ Factor) %>%
  purrr::map(\(x) lm(formula = y ~ x, data = x))

Upvotes: 0

Related Questions