Fleaf
Fleaf

Reputation: 71

Running linear models for groups within dataframe and storing outputs in dataframe in R

I am trying to run multiple linear models for a very large dataset and store the outputs in a dataframe. I have managed to get estimates and p-values into dataframe (see below) but I also want to store the AIC for each model.

#example dataframe

dt = data.frame(x = rnorm(40, 5, 5),
                y = rnorm(40, 3, 4),
                group = rep(c("a","b"), 20))

library(dplyr)
library(broom)

# code that runs lm for each group in row z and stores output 
dt_lm <- dt %>%
  group_by(group) %>%  
  do(tidy(lm(y~x, data=.)))

Upvotes: 2

Views: 441

Answers (2)

akrun
akrun

Reputation: 886938

In the newer version of dplyr i.e. >= 1.0, we can also use nest_by

library(dplyr)
library(tidyr)
library(broom)
dt %>% 
     nest_by(group) %>%
     transmute(out = list(glance(lm(y ~ x, data = data))))  %>% 
     unnest(c(out)) %>% 
     select(AIC)
# A tibble: 2 x 2
# Groups:   group [2]
#  group   AIC
#  <chr> <dbl>
#1 a      115.
#2 b      100.

Upvotes: 2

deschen
deschen

Reputation: 10996

Use glance instead of tidy:

dt_lm <- dt %>%
  group_by(group) %>%
  do(glance(lm(y~x, data=.))) %>%
  select(AIC)

which gives:

Adding missing grouping variables: `group`
# A tibble: 2 x 2
# Groups:   group [2]
  group   AIC
  <chr> <dbl>
1 a      119.
2 b      114.

If you not only want to store the AIC but other metrics just skip the select part.

Upvotes: 4

Related Questions