Reputation: 2252
I am often in the position of wanting to split-apply-combine regression models. I've found two ways of doing it, the "purrr
" approach and the "dplyr::do()
" approach.
Issue with the purrr
approach: I want columns in the resulting data.frame to indicate the levels of the variables according to which the split was done, as in a normal group_by %>%
summarize operation.
Issue with the dplyr::do()
approach: there's a nasty tangle of do(tidy(lm_robust))
that is decidedly inelegant. But I get the columns back.
Main Q: is there a way to do split-apply-combine in purrr
that returns the variable splits nicely?
The minimum working example below shows that the problem interacts with how many variables you're splitting by.
library(tidyverse)
library(estimatr) # for lm_robust
# spliting by one variable
# the purrr approach
mtcars %>%
split(.$am) %>%
map(~lm_robust(mpg ~ hp, data = .)) %>%
map_df(tidy, .id = "am") # annoying to have to type "am" again!
# the dplyr do() approach
mtcars %>%
group_by(am) %>%
do(tidy(lm_robust(mpg ~ hp, data = .))) # gross nesting
# Splitting by two variables
# the purr approach??
mtcars %>%
split(list(.$am, .$vs)) %>%
map(~lm_robust(mpg ~ hp, data = .)) %>%
map_df(tidy, .id = "OH NO") # the column encodes both am and vs info
# the dplyr do() approach works great
mtcars %>%
group_by(am, vs) %>%
do(tidy(lm_robust(mpg ~ hp, data = .))) # still nested up.
EDIT
here's a way that uses nest() and unnest(). clunky, but maybe the best purrr approach? inspired by http://stat545.com/block024_group-nest-split-map.html
mtcars %>%
group_by(am, vs) %>%
nest() %>%
mutate(fit = map(data, ~lm_robust(mpg ~ hp, data = .)),
tidy = map(fit, tidy)) %>%
select(am, vs, tidy) %>%
unnest(tidy)
EDIT 2
Here's a way with group_map
that's just as ugly as do
, but maybe that's just the way it goes.
mtcars %>%
group_by(am, vs) %>%
group_map(~tidy(lm_robust(mpg ~ hp, data = .x)))
EDIT 3:
I guess what would seem beautiful to me would do one thing per line, something like this, but I respect the comments below saying, geez, group map is pretty close.
# does not work
mtcars %>%
group_by(am, vs) %>%
map(~lm_robust(mpg ~ hp, data = .)) %>%
map_df(tidy)
Upvotes: 4
Views: 2462
Reputation: 12528
A one-liner:
mtcars %>% reframe(tidy(lm_robust(mpg ~ hp, .)), .by = c(am, vs))
Upvotes: 1