Erich Neuwirth
Erich Neuwirth

Reputation: 1031

group_by and summaries with variable number of variables

Using the {{var}} notation the following code works.
The variables to be used for grouping and for summarizing van be given as parameters to my_summary

I would like to modify my_summary so that I can give a varying number of variables for both grouping and summarizing. Can this be done?

suppressPackageStartupMessages({
  library(tidyverse)
})

set.seed(4321)
demo_df <- 
  tibble(age=as.integer(rep(c(10,20),each=10)),
         gender=rep(c("f","m"),10),
         weight=rnorm(20,70,7),
         size=rnorm(20,160,15))

my_summary <- function(df_in,group_var,summary_var){
  df_in |>
    group_by({{group_var}}) |>
    summarise_at(vars({{summary_var}}),mean)
}


my_summary(demo_df,gender,weight)



Upvotes: 0

Views: 150

Answers (2)

PaulS
PaulS

Reputation: 25313

Another possible solution, allowing for multiple grouping variables:

library(tidyverse)

my_summary <- function(df_in, group_var,summary_var){
  
  df_in %>% 
    group_by(!!!group_var)  %>% 
    summarise(across({{summary_var}}, mean), .groups = "drop")
}

my_summary(demo_df, vars(age,gender), c(weight,size))

#> # A tibble: 4 × 4
#>     age gender weight  size
#>   <int> <chr>   <dbl> <dbl>
#> 1    10 f        71.5  159.
#> 2    10 m        72.4  158.
#> 3    20 f        64.3  167.
#> 4    20 m        71.6  164.

Alternatively, without vars (that may be superseded):

library(tidyverse)

my_summary <- function(df_in, summary_var , ...){
  summary_var <- enquos(summary_var)
  group_var <- enquos(...)
  
  df_in %>% 
    group_by(!!!group_var)  %>% 
    summarise(across(!!!summary_var,mean), .groups = "drop")
}

my_summary(demo_df, c(weight, size), age, gender)

#> # A tibble: 4 × 4
#>     age gender weight  size
#>   <int> <chr>   <dbl> <dbl>
#> 1    10 f        71.5  159.
#> 2    10 m        72.4  158.
#> 3    20 f        64.3  167.
#> 4    20 m        71.6  164.

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

Use summarise(across(.)).

suppressPackageStartupMessages({
  library(tidyverse)
})

set.seed(4321)
demo_df <- 
  tibble(age=as.integer(rep(c(10,20),each=10)),
         gender=rep(c("f","m"),10),
         weight=rnorm(20,70,7),
         size=rnorm(20,160,15))

my_summary <- function(df_in,group_var,summary_var){
  df_in |>
    group_by({{group_var}}) |>
    summarise(across({{summary_var}}, mean))
}


my_summary(demo_df, gender, weight:size)
#> # A tibble: 2 × 3
#>   gender weight  size
#>   <chr>   <dbl> <dbl>
#> 1 f        67.9  163.
#> 2 m        72.0  161.

Created on 2022-06-09 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions