dplyr'ish approach to simple subset/summarise function

Question

Background

The provided function achieves the following:

Subsets the provided data frame using user-provided expression
Selects the desired column
Applies custom summary function on resulting vector and returns scalar

`base` approach

summarise_filtered <-
    function(df,
             subset_arg,
             summary_fun = c("min", "max", "median"),
             select_col) {
        summary_fun <- match.arg(summary_fun)

        sbst_vals <-
            subset.data.frame(
                df,
                subset = eval(parse(text = subset_arg)),
                drop = TRUE,
                select = eval(parse(text = select_col))
            )

        do.call(match.fun(summary_fun), list(sbst_vals))

    }

Results

summarise_filtered(mtcars, "am == 1", "min", "cyl")
# [1] 4
summarise_filtered(mtcars, "am == 1", "max", "cyl")
# [1] 8

Challenge

I'm interested in re-writing the function above using dplyr pipe syntax. My initial attempt fulfils the basic requirements:

summarise_filtered_dplyrish <-
    function(df,
             subset_arg,
             summary_fun,
             select_col) {

        df %>%
            filter({{subset_arg}}) %>%
            summarise(across(.cols = {{select_col}}, .fns = summary_fun)) %>%
            pull({{select_col}})

    }

when called:

summarise_filtered_dplyrish(mtcars, am == 1, min, cyl)
# [1] 4

Problem

I would like for the function to work using:

summarise_filtered_dplyrish(mtcars, "am == 1", "min", "cyl")

syntax, in addition to the the already working solution. How to do this? So far, the call above generates error:

Error

Error: Problem with filter() input ..1. x Input ..1 must be a logical vector, not a character. ℹ Input ..1 is "am == 1". Run rlang::last_error() to see where the error occurred.

Artem Sokolov · Accepted Answer

min and cyl can be easily handled by ensym(), which works with both strings and symbols. The expression am == 1 requires a little bit more work. Let's define a helper function that parses an object only if it's a string:

str2expr <- function(.x) {if( is.character(.x) ) rlang::parse_expr(.x) else .x}

We can now capture the argument provided to subset_arg and parse it if it's a string:

summarise_filtered_dplyrish <-
    function(df,
             subset_arg,
             summary_fun,
             select_col) {

        subset_expr <- rlang::enexpr(subset_arg) %>% str2expr()

        df %>%
            filter( !!subset_expr ) %>%
            summarise(across(.cols = {{select_col}}, .fns = !!ensym(summary_fun))) %>%
            pull( !!ensym(select_col) )
    }

summarise_filtered_dplyrish( mtcars, am == 1, min, cyl )        # Works
summarise_filtered_dplyrish( mtcars, "am == 1", "min", "cyl" )  # Also works

Brief explanation: {{x}} is shorthand for !!enquo(x) which captures the expression provided to the function argument and the context where that expression should be evaluated. Since, your context is effectively defined by df, it's OK to relax enquo down to enexpr (which captures expressions but not the evaluation context) and ensym (which captures symbols or strings containing names of symbols).

dplyr'ish approach to simple subset/summarise function

Background

`base` approach

Results

Challenge

Problem

Error

Answers (1)

Related Questions

dplyr&#39;ish approach to simple subset/summarise function

Background

base approach

Results

Challenge

Problem

Error

Answers (1)

Related Questions

dplyr'ish approach to simple subset/summarise function

`base` approach