Reputation: 223
I am trying to find the correct syntax for a scoped summarise
function (using dplyr 1.0.2).
Here is my unscoped version:
mtcars %>%
group_by(am, gear) %>%
summarise(sum = sum(disp), n = n(), prop = sum(disp) / n())
But the scoped version doesn't work, I've tried different options without success:
mtcars %>%
group_by(am, gear) %>%
summarise_if(is.double, list(sum = sum(), n = n(), prop = sum() / n()))
Error: `n()` must only be used inside dplyr verbs.
This one doesn't work either
mtcars %>%
group_by(am, gear) %>%
summarise_if(is.double, ~ sum(.x), ~ n(), ~sum() / n())
Upvotes: 1
Views: 89
Reputation: 887811
With the newer version, we can use across
instead of _if
suffix
library(dplyr)
mtcars %>%
group_by(am, gear) %>%
summarise(across(where(is.double), list(sum = ~ sum(.),
n = ~ n(), prop = ~ sum(.)/n())), .groups = 'drop')
-output
# A tibble: 4 x 29
# am gear mpg_sum mpg_n mpg_prop cyl_sum cyl_n cyl_prop disp_sum disp_n disp_prop hp_sum hp_n hp_prop drat_sum drat_n drat_prop wt_sum
# <dbl> <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl>
#1 0 3 242. 15 16.1 112 15 7.47 4894. 15 326. 2642 15 176. 47.0 15 3.13 58.4
#2 0 4 84.2 4 21.0 20 4 5 623. 4 156. 403 4 101. 15.4 4 3.86 13.2
#3 1 4 210. 8 26.3 36 8 4.5 854. 8 107. 671 8 83.9 33.1 8 4.13 18.2
#4 1 5 107. 5 21.4 30 5 6 1012. 5 202. 978 5 196. 19.6 5 3.92 13.2
# … with 11 more variables: wt_n <int>, wt_prop <dbl>, qsec_sum <dbl>, qsec_n <int>, qsec_prop <dbl>, vs_sum <dbl>, vs_n <int>, vs_prop <dbl>,
# carb_sum <dbl>, carb_n <int>, carb_prop <dbl>
Or using summarise_if
mtcars %>%
group_by(am, gear) %>%
summarise_if(is.double, list(sum = ~sum(.), n = ~n(), prop = ~sum(.) / n()))
# A tibble: 4 x 29
# Groups: am [2]
# am gear mpg_sum cyl_sum disp_sum hp_sum drat_sum wt_sum qsec_sum vs_sum carb_sum mpg_n cyl_n disp_n hp_n drat_n wt_n qsec_n vs_n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <int> <int>
#1 0 3 242. 112 4894. 2642 47.0 58.4 265. 3 40 15 15 15 15 15 15 15 15
#2 0 4 84.2 20 623. 403 15.4 13.2 80.1 4 12 4 4 4 4 4 4 4 4
#3 1 4 210. 36 854. 671 33.1 18.2 147. 6 16 8 8 8 8 8 8 8 8
#4 1 5 107. 30 1012. 978 19.6 13.2 78.2 1 22 5 5 5 5 5 5 5 5
# … with 10 more variables: carb_n <int>, mpg_prop <dbl>, cyl_prop <dbl>, disp_prop <dbl>, hp_prop <dbl>, drat_prop <dbl>, wt_prop <dbl>,
# qsec_prop <dbl>, vs_prop <dbl>, carb_prop <dbl>
Using n
for all columns give the same output because it is the count for each group and it doesn't change. It may be better to have it outside the across
(and that is one of the flexibility of using across
)
mtcars %>%
group_by(am, gear) %>%
summarise(n = n(), across(where(is.double), list(sum = ~ sum(.),
prop = ~ sum(.)/n)), .groups = 'drop')
-output
# A tibble: 4 x 21
# am gear n mpg_sum mpg_prop cyl_sum cyl_prop disp_sum disp_prop hp_sum hp_prop drat_sum drat_prop wt_sum wt_prop qsec_sum qsec_prop
# <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0 3 15 242. 16.1 112 7.47 4894. 326. 2642 176. 47.0 3.13 58.4 3.89 265. 17.7
#2 0 4 4 84.2 21.0 20 5 623. 156. 403 101. 15.4 3.86 13.2 3.30 80.1 20.0
#3 1 4 8 210. 26.3 36 4.5 854. 107. 671 83.9 33.1 4.13 18.2 2.27 147. 18.4
#4 1 5 5 107. 21.4 30 6 1012. 202. 978 196. 19.6 3.92 13.2 2.63 78.2 15.6
# … with 4 more variables: vs_sum <dbl>, vs_prop <dbl>, carb_sum <dbl>, carb_prop <dbl>
Upvotes: 2