Is it possible to create separate linear models for each group in Dplyr's summarize

Question

I have some data like this

group_name | x | y
------------------
a          | 1 | 2
a          | 2 | 4
a          | 3 | 6
b          | 1 | 4
b          | 2 | 3
b          | 3 | 2
c          | 1 | 2
c          | 2 | 5
c          | 3 | 8

I would like to group it by group_name, and use Dplyr's summarize function to create a column containing a linear model lm(y ~ x) for each group. Is it even possible? If not, what are the alternatives for creating models for each group?

Thank you in advance

Jon Spring · Accepted Answer

Adapting the example from https://cran.r-project.org/web/packages/broom/vignettes/broom_and_dplyr.html:

library(tidyverse); library(broom)
df %>%
  nest(-group_name) %>% 
  mutate(fit = map(data, ~lm(y ~ x, data = .x)),
         tidied = map(fit, tidy)) %>%
  unnest(tidied)

  group_name        term estimate    std.error     statistic      p.value
1          a (Intercept)        0 0.000000e+00           NaN          NaN
2          a           x        2 0.000000e+00           Inf 0.000000e+00
3          b (Intercept)        5 1.017536e-15  4.913830e+15 1.295567e-16
4          b           x       -1 4.710277e-16 -2.123017e+15 2.998656e-16
5          c (Intercept)       -1 1.356715e-15 -7.370745e+14 8.637116e-16
6          c           x        3 6.280370e-16  4.776789e+15 1.332736e-16

Edit: One way to get the predictions is to use augment from broom:

library(tidyverse); library(broom)
df %>%
  nest(-group_name) %>% 
  mutate(fit = map(data, ~lm(y ~ x, data = .x)),
         predictions = map(fit, augment)) %>%
  unnest(predictions)

   group_name y x .fitted      .se.fit        .resid      .hat .sigma .rownames .cooksd .std.resid
1 a           2 1       2 0.000000e+00  0.000000e+00 0.8333333    NaN            NA         NA
2 a           4 2       4 0.000000e+00  0.000000e+00 0.3333333    NaN            NA         NA
3 a           6 3       6 0.000000e+00  0.000000e+00 0.8333333    NaN            NA         NA
4 b           4 1       4 6.080942e-16  2.719480e-16 0.8333333    NaN         4    2.50          1
5 b           3 2       3 3.845925e-16 -5.438960e-16 0.3333333    NaN         5    0.25         -1
6 b           2 3       2 6.080942e-16  2.719480e-16 0.8333333    Inf         6    2.50          1
7 c           2 1       2 8.107923e-16 -3.625973e-16 0.8333333    NaN         7    2.50         -1
8 c           5 2       5 5.127900e-16  7.251946e-16 0.3333333    NaN         8    0.25          1
9 c           8 3       8 8.107923e-16 -3.625973e-16 0.8333333    Inf         9    2.50         -1

Is it possible to create separate linear models for each group in Dplyr's summarize

Answers (2)

Related Questions

Is it possible to create separate linear models for each group in Dplyr&#39;s summarize

Answers (2)

Related Questions

Is it possible to create separate linear models for each group in Dplyr's summarize