Reputation: 21204
Suppose I want to summarise a data frame after grouping with differing functions. How can I do that?
mtcars %>% group_by(cyl) %>% summarise(size = n())
# A tibble: 3 x 2
cyl size
<dbl> <int>
1 4 11
2 6 7
3 8 14
But if I try:
mtcars %>% group_by(cyl) %>% summarise(size = n()) %>% summarise_at(vars(c(mpg, am:carb)), mean)
Error in is_string(y) : object 'carb' not found
How can I get first the size of each group with n()
and then the mean of the other chosen features?
Upvotes: 2
Views: 919
Reputation: 886948
We can use data.table
methods
library(data.table)
as.data.table(mtcars)[, n := .N, cyl][, lapply(.SD, mean), cyl,
.SDcols = c("mpg", "am", "gear", "carb", "n")]
#. yl mpg am gear carb n
#1: 6 19.74286 0.4285714 3.857143 3.428571 7
#2: 4 26.66364 0.7272727 4.090909 1.545455 11
#3: 8 15.10000 0.1428571 3.285714 3.500000 14
Or with tidyverse
library(tidyverse)
mtcars %>%
add_count(cyl) %>%
group_by(cyl) %>%
summarise_at(vars(mpg, am:carb, n), mean)
# A tibble: 3 x 6
# cyl mpg am gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 4 26.7 0.727 4.09 1.55 11
#2 6 19.7 0.429 3.86 3.43 7
#3 8 15.1 0.143 3.29 3.5 14
Or using base R
nm1 <- c("mpg", "am", "gear", "carb", "cyl")
transform(aggregate(.~ cyl, mtcars[nm1], mean), n = as.vector(table(mtcars$cyl)))
# cyl mpg am gear carb n
#1 4 26.66364 0.7272727 4.090909 1.545455 11
#2 6 19.74286 0.4285714 3.857143 3.428571 7
#3 8 15.10000 0.1428571 3.285714 3.500000 14
Upvotes: 1
Reputation: 388817
Since summarise
removes the column which are not grouped or summarised, an alternative in this case would be to first add a new column with mutate
(so that all other columns remain as it is) to count number of rows in each group and include that column in summarise_at
calculation.
library(dplyr)
mtcars %>%
group_by(cyl) %>%
mutate(n = n()) %>%
summarise_at(vars(mpg, am:carb, n), mean)
# A tibble: 3 x 6
# cyl mpg am gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 4 26.7 0.727 4.09 1.55 11
#2 6 19.7 0.429 3.86 3.43 7
#3 8 15.1 0.143 3.29 3.5 14
Upvotes: 1
Reputation: 599
Here is one way using a dplyr::inner_join()
on the two summarize operations by the grouping variable:
mtcars %>%
group_by(cyl) %>%
summarise(size = n()) %>%
inner_join(
mtcars %>%
group_by(cyl) %>%
summarise_at(vars(c(mpg, am:carb)), mean),
by='cyl' )
Output is:
# A tibble: 3 x 6
cyl size mpg am gear carb
<dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 4 11 26.7 0.727 4.09 1.55
2 6 7 19.7 0.429 3.86 3.43
3 8 14 15.1 0.143 3.29 3.5
Upvotes: 4