Reputation: 71
I am trying to run multiple linear models for a very large dataset and store the outputs in a dataframe. I have managed to get estimates and p-values into dataframe (see below) but I also want to store the AIC for each model.
#example dataframe
dt = data.frame(x = rnorm(40, 5, 5),
y = rnorm(40, 3, 4),
group = rep(c("a","b"), 20))
library(dplyr)
library(broom)
# code that runs lm for each group in row z and stores output
dt_lm <- dt %>%
group_by(group) %>%
do(tidy(lm(y~x, data=.)))
Upvotes: 2
Views: 441
Reputation: 886938
In the newer version of dplyr
i.e. >= 1.0
, we can also use nest_by
library(dplyr)
library(tidyr)
library(broom)
dt %>%
nest_by(group) %>%
transmute(out = list(glance(lm(y ~ x, data = data)))) %>%
unnest(c(out)) %>%
select(AIC)
# A tibble: 2 x 2
# Groups: group [2]
# group AIC
# <chr> <dbl>
#1 a 115.
#2 b 100.
Upvotes: 2
Reputation: 10996
Use glance
instead of tidy
:
dt_lm <- dt %>%
group_by(group) %>%
do(glance(lm(y~x, data=.))) %>%
select(AIC)
which gives:
Adding missing grouping variables: `group`
# A tibble: 2 x 2
# Groups: group [2]
group AIC
<chr> <dbl>
1 a 119.
2 b 114.
If you not only want to store the AIC but other metrics just skip the select
part.
Upvotes: 4