kmace
kmace

Reputation: 2044

adding summarize output to original tibble

I would like to do something in between mutate and summarize.

I would like to calculate a summary statistic on groups, but retain the original data as a nested object. I assume this is a pretty generic task, but I can't figure out how to do without invoking a join as well as grouping twice. example code below:

mtcars %>% 
  group_by(cyl) %>% 
  nest() %>% 
  left_join(mtcars %>% 
              group_by(cyl) %>% 
              summarise(mean_mpg = mean(mpg)))

which produced desired output:

# A tibble: 3 x 3
    cyl               data mean_mpg
  <dbl>             <list>    <dbl>
1     6  <tibble [7 x 10]> 19.74286
2     4 <tibble [11 x 10]> 26.66364
3     8 <tibble [14 x 10]> 15.10000

but I feel like this is not the "correct" way to do this.

Upvotes: 1

Views: 441

Answers (1)

akuiper
akuiper

Reputation: 214957

Here is one way to do this without join; Use map_dbl (which is essentially a map with the out come be a vector of type double) from purrr package (one member of the tidyverse family) to calculate the mean of mpg nested in the data column:

mtcars %>% 
    group_by(cyl) %>% 
    nest() %>% 
    mutate(mean_mpg = map_dbl(data, ~ mean(.x$mpg)))

# A tibble: 3 x 3
#    cyl               data mean_mpg
#  <dbl>             <list>    <dbl>
#1     6  <tibble [7 x 10]> 19.74286
#2     4 <tibble [11 x 10]> 26.66364
#3     8 <tibble [14 x 10]> 15.10000

Or you can calculate mean_mpg before nesting, and add mean_mpg as one of the group variables:

mtcars %>% 
    group_by(cyl) %>% 
    mutate(mean_mpg = mean(mpg)) %>%
    group_by(mean_mpg, add=TRUE) %>%
    nest()

# A tibble: 3 x 3
#    cyl mean_mpg               data
#  <dbl>    <dbl>             <list>
#1     6 19.74286  <tibble [7 x 10]>
#2     4 26.66364 <tibble [11 x 10]>
#3     8 15.10000 <tibble [14 x 10]>

Upvotes: 2

Related Questions