Reputation: 2044
I would like to do something in between mutate and summarize.
I would like to calculate a summary statistic on groups, but retain the original data as a nested object. I assume this is a pretty generic task, but I can't figure out how to do without invoking a join as well as grouping twice. example code below:
mtcars %>%
group_by(cyl) %>%
nest() %>%
left_join(mtcars %>%
group_by(cyl) %>%
summarise(mean_mpg = mean(mpg)))
which produced desired output:
# A tibble: 3 x 3
cyl data mean_mpg
<dbl> <list> <dbl>
1 6 <tibble [7 x 10]> 19.74286
2 4 <tibble [11 x 10]> 26.66364
3 8 <tibble [14 x 10]> 15.10000
but I feel like this is not the "correct" way to do this.
Upvotes: 1
Views: 441
Reputation: 214957
Here is one way to do this without join
; Use map_dbl
(which is essentially a map
with the out come be a vector of type double
) from purrr
package (one member of the tidyverse
family) to calculate the mean of mpg
nested in the data
column:
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(mean_mpg = map_dbl(data, ~ mean(.x$mpg)))
# A tibble: 3 x 3
# cyl data mean_mpg
# <dbl> <list> <dbl>
#1 6 <tibble [7 x 10]> 19.74286
#2 4 <tibble [11 x 10]> 26.66364
#3 8 <tibble [14 x 10]> 15.10000
Or you can calculate mean_mpg
before nesting, and add mean_mpg
as one of the group variables:
mtcars %>%
group_by(cyl) %>%
mutate(mean_mpg = mean(mpg)) %>%
group_by(mean_mpg, add=TRUE) %>%
nest()
# A tibble: 3 x 3
# cyl mean_mpg data
# <dbl> <dbl> <list>
#1 6 19.74286 <tibble [7 x 10]>
#2 4 26.66364 <tibble [11 x 10]>
#3 8 15.10000 <tibble [14 x 10]>
Upvotes: 2