Alex Coppock
Alex Coppock

Reputation: 2252

purrr split %>% map %>% bind VERSUS dplyr group_by %>% do

I am often in the position of wanting to split-apply-combine regression models. I've found two ways of doing it, the "purrr" approach and the "dplyr::do()" approach.

Issue with the purrr approach: I want columns in the resulting data.frame to indicate the levels of the variables according to which the split was done, as in a normal group_by %>% summarize operation.

Issue with the dplyr::do() approach: there's a nasty tangle of do(tidy(lm_robust)) that is decidedly inelegant. But I get the columns back.

Main Q: is there a way to do split-apply-combine in purrr that returns the variable splits nicely?

The minimum working example below shows that the problem interacts with how many variables you're splitting by.

library(tidyverse)
library(estimatr) # for lm_robust

# spliting by one variable

# the purrr approach
mtcars %>%
  split(.$am) %>%
  map(~lm_robust(mpg ~ hp, data = .)) %>%
  map_df(tidy, .id = "am") # annoying to have to type "am" again!

# the dplyr do() approach
mtcars %>%
  group_by(am) %>%
  do(tidy(lm_robust(mpg ~ hp, data = .))) # gross nesting

# Splitting by two variables

# the purr approach??
mtcars %>%
  split(list(.$am, .$vs)) %>%
  map(~lm_robust(mpg ~ hp, data = .))  %>%
  map_df(tidy, .id = "OH NO") # the column encodes both am and vs info

# the dplyr do() approach works great
mtcars %>%
  group_by(am, vs) %>%
  do(tidy(lm_robust(mpg ~ hp, data = .))) # still nested up.

EDIT

here's a way that uses nest() and unnest(). clunky, but maybe the best purrr approach? inspired by http://stat545.com/block024_group-nest-split-map.html

mtcars %>%
  group_by(am, vs) %>%
  nest() %>%
  mutate(fit = map(data, ~lm_robust(mpg ~ hp, data = .)),
         tidy = map(fit, tidy)) %>%
  select(am, vs, tidy) %>%
  unnest(tidy)

EDIT 2

Here's a way with group_map that's just as ugly as do, but maybe that's just the way it goes.

mtcars %>%
  group_by(am, vs) %>%
  group_map(~tidy(lm_robust(mpg ~ hp, data = .x)))

EDIT 3:

I guess what would seem beautiful to me would do one thing per line, something like this, but I respect the comments below saying, geez, group map is pretty close.

# does not work
mtcars %>%
  group_by(am, vs) %>%
  map(~lm_robust(mpg ~ hp, data = .)) %>%
  map_df(tidy)

Upvotes: 4

Views: 2462

Answers (1)

Mark
Mark

Reputation: 12528

A one-liner:

mtcars %>% reframe(tidy(lm_robust(mpg ~ hp, .)), .by = c(am, vs))

Upvotes: 1

Related Questions