Doug Fir
Doug Fir

Reputation: 21204

Use summarise and summarise_at in same dplyr chain

Suppose I want to summarise a data frame after grouping with differing functions. How can I do that?

mtcars %>% group_by(cyl) %>% summarise(size = n())
# A tibble: 3 x 2
    cyl  size
  <dbl> <int>
1     4    11
2     6     7
3     8    14

But if I try:

mtcars %>% group_by(cyl) %>% summarise(size = n()) %>% summarise_at(vars(c(mpg, am:carb)), mean)
Error in is_string(y) : object 'carb' not found

How can I get first the size of each group with n() and then the mean of the other chosen features?

Upvotes: 2

Views: 919

Answers (3)

akrun
akrun

Reputation: 886948

We can use data.table methods

library(data.table)
as.data.table(mtcars)[, n := .N, cyl][, lapply(.SD, mean), cyl, 
        .SDcols = c("mpg", "am", "gear", "carb", "n")]
#.   yl      mpg        am     gear     carb  n
#1:   6 19.74286 0.4285714 3.857143 3.428571  7
#2:   4 26.66364 0.7272727 4.090909 1.545455 11
#3:   8 15.10000 0.1428571 3.285714 3.500000 14

Or with tidyverse

library(tidyverse)
mtcars %>%
   add_count(cyl) %>%
   group_by(cyl) %>%
   summarise_at(vars(mpg, am:carb, n), mean)
# A tibble: 3 x 6
#    cyl   mpg    am  gear  carb     n
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     4  26.7 0.727  4.09  1.55    11
#2     6  19.7 0.429  3.86  3.43     7
#3     8  15.1 0.143  3.29  3.5     14

Or using base R

nm1 <- c("mpg", "am", "gear", "carb", "cyl")
transform(aggregate(.~ cyl, mtcars[nm1], mean), n = as.vector(table(mtcars$cyl)))
#  cyl      mpg        am     gear     carb  n
#1   4 26.66364 0.7272727 4.090909 1.545455 11
#2   6 19.74286 0.4285714 3.857143 3.428571  7
#3   8 15.10000 0.1428571 3.285714 3.500000 14

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

Since summarise removes the column which are not grouped or summarised, an alternative in this case would be to first add a new column with mutate (so that all other columns remain as it is) to count number of rows in each group and include that column in summarise_at calculation.

library(dplyr)         

mtcars %>%
   group_by(cyl) %>%
   mutate(n = n()) %>%
   summarise_at(vars(mpg, am:carb, n), mean)

# A tibble: 3 x 6
#    cyl   mpg    am  gear  carb     n
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     4  26.7 0.727  4.09  1.55    11
#2     6  19.7 0.429  3.86  3.43     7
#3     8  15.1 0.143  3.29  3.5     14

Upvotes: 1

bmosov01
bmosov01

Reputation: 599

Here is one way using a dplyr::inner_join() on the two summarize operations by the grouping variable:

mtcars %>% 
  group_by(cyl) %>% 
  summarise(size = n()) %>% 
  inner_join( 
    mtcars %>%
      group_by(cyl) %>%
      summarise_at(vars(c(mpg, am:carb)), mean),
    by='cyl' )

Output is:

# A tibble: 3 x 6
    cyl  size   mpg    am  gear  carb
  <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1     4    11  26.7 0.727  4.09  1.55
2     6     7  19.7 0.429  3.86  3.43
3     8    14  15.1 0.143  3.29  3.5 

Upvotes: 4

Related Questions