Paul
Paul

Reputation: 223

Correct syntax for scoped summarise and n()

I am trying to find the correct syntax for a scoped summarise function (using dplyr 1.0.2).

Here is my unscoped version:

mtcars %>%
  group_by(am, gear) %>%
  summarise(sum = sum(disp), n = n(), prop = sum(disp) / n())

But the scoped version doesn't work, I've tried different options without success:

mtcars %>%
  group_by(am, gear) %>%
  summarise_if(is.double, list(sum = sum(), n = n(), prop = sum() / n()))

Error: `n()` must only be used inside dplyr verbs.

This one doesn't work either


mtcars %>%
  group_by(am, gear) %>%
  summarise_if(is.double, ~ sum(.x), ~ n(), ~sum() / n())

Upvotes: 1

Views: 89

Answers (1)

akrun
akrun

Reputation: 887811

With the newer version, we can use across instead of _if suffix

library(dplyr)
mtcars %>% 
     group_by(am, gear) %>%
     summarise(across(where(is.double),   list(sum = ~ sum(.),
           n = ~ n(), prop = ~ sum(.)/n())), .groups = 'drop')

-output

# A tibble: 4 x 29
#     am  gear mpg_sum mpg_n mpg_prop cyl_sum cyl_n cyl_prop disp_sum disp_n disp_prop hp_sum  hp_n hp_prop drat_sum drat_n drat_prop wt_sum
 # <dbl> <dbl>   <dbl> <int>    <dbl>   <dbl> <int>    <dbl>    <dbl>  <int>     <dbl>  <dbl> <int>   <dbl>    <dbl>  <int>     <dbl>  <dbl>
#1     0     3   242.     15     16.1     112    15     7.47    4894.     15      326.   2642    15   176.      47.0     15      3.13   58.4
#2     0     4    84.2     4     21.0      20     4     5        623.      4      156.    403     4   101.      15.4      4      3.86   13.2
#3     1     4   210.      8     26.3      36     8     4.5      854.      8      107.    671     8    83.9     33.1      8      4.13   18.2
#4     1     5   107.      5     21.4      30     5     6       1012.      5      202.    978     5   196.      19.6      5      3.92   13.2
# … with 11 more variables: wt_n <int>, wt_prop <dbl>, qsec_sum <dbl>, qsec_n <int>, qsec_prop <dbl>, vs_sum <dbl>, vs_n <int>, vs_prop <dbl>,
#   carb_sum <dbl>, carb_n <int>, carb_prop <dbl>

Or using summarise_if

mtcars %>%
   group_by(am, gear) %>%
   summarise_if(is.double, list(sum = ~sum(.), n = ~n(), prop = ~sum(.) / n()))
# A tibble: 4 x 29
# Groups:   am [2]
#     am  gear mpg_sum cyl_sum disp_sum hp_sum drat_sum wt_sum qsec_sum vs_sum carb_sum mpg_n cyl_n disp_n  hp_n drat_n  wt_n qsec_n  vs_n
#  <dbl> <dbl>   <dbl>   <dbl>    <dbl>  <dbl>    <dbl>  <dbl>    <dbl>  <dbl>    <dbl> <int> <int>  <int> <int>  <int> <int>  <int> <int>
#1     0     3   242.      112    4894.   2642     47.0   58.4    265.       3       40    15    15     15    15     15    15     15    15
#2     0     4    84.2      20     623.    403     15.4   13.2     80.1      4       12     4     4      4     4      4     4      4     4
#3     1     4   210.       36     854.    671     33.1   18.2    147.       6       16     8     8      8     8      8     8      8     8
#4     1     5   107.       30    1012.    978     19.6   13.2     78.2      1       22     5     5      5     5      5     5      5     5
# … with 10 more variables: carb_n <int>, mpg_prop <dbl>, cyl_prop <dbl>, disp_prop <dbl>, hp_prop <dbl>, drat_prop <dbl>, wt_prop <dbl>,
#   qsec_prop <dbl>, vs_prop <dbl>, carb_prop <dbl>

Using n for all columns give the same output because it is the count for each group and it doesn't change. It may be better to have it outside the across (and that is one of the flexibility of using across)

mtcars %>% 
     group_by(am, gear) %>%
     summarise(n = n(), across(where(is.double),   list(sum = ~ sum(.),
            prop = ~ sum(.)/n)), .groups = 'drop')

-output

# A tibble: 4 x 21
#     am  gear     n mpg_sum mpg_prop cyl_sum cyl_prop disp_sum disp_prop hp_sum hp_prop drat_sum drat_prop wt_sum wt_prop qsec_sum qsec_prop
#  <dbl> <dbl> <int>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>     <dbl>  <dbl>   <dbl>    <dbl>     <dbl>  <dbl>   <dbl>    <dbl>     <dbl>
#1     0     3    15   242.      16.1     112     7.47    4894.      326.   2642   176.      47.0      3.13   58.4    3.89    265.       17.7
#2     0     4     4    84.2     21.0      20     5        623.      156.    403   101.      15.4      3.86   13.2    3.30     80.1      20.0
#3     1     4     8   210.      26.3      36     4.5      854.      107.    671    83.9     33.1      4.13   18.2    2.27    147.       18.4
#4     1     5     5   107.      21.4      30     6       1012.      202.    978   196.      19.6      3.92   13.2    2.63     78.2      15.6
# … with 4 more variables: vs_sum <dbl>, vs_prop <dbl>, carb_sum <dbl>, carb_prop <dbl>

Upvotes: 2

Related Questions