Konrad
Konrad

Reputation: 18657

dplyr'ish approach to simple subset/summarise function

Background

The provided function achieves the following:

base approach

summarise_filtered <-
    function(df,
             subset_arg,
             summary_fun = c("min", "max", "median"),
             select_col) {
        summary_fun <- match.arg(summary_fun)

        sbst_vals <-
            subset.data.frame(
                df,
                subset = eval(parse(text = subset_arg)),
                drop = TRUE,
                select = eval(parse(text = select_col))
            )

        do.call(match.fun(summary_fun), list(sbst_vals))

    }

Results

summarise_filtered(mtcars, "am == 1", "min", "cyl")
# [1] 4
summarise_filtered(mtcars, "am == 1", "max", "cyl")
# [1] 8

Challenge

I'm interested in re-writing the function above using dplyr pipe syntax. My initial attempt fulfils the basic requirements:

summarise_filtered_dplyrish <-
    function(df,
             subset_arg,
             summary_fun,
             select_col) {

        df %>%
            filter({{subset_arg}}) %>%
            summarise(across(.cols = {{select_col}}, .fns = summary_fun)) %>%
            pull({{select_col}})

    }

when called:

summarise_filtered_dplyrish(mtcars, am == 1, min, cyl)
# [1] 4

Problem

I would like for the function to work using:

summarise_filtered_dplyrish(mtcars, "am == 1", "min", "cyl")

syntax, in addition to the the already working solution. How to do this? So far, the call above generates error:

Error

Error: Problem with filter() input ..1. x Input ..1 must be a logical vector, not a character. ℹ Input ..1 is "am == 1". Run rlang::last_error() to see where the error occurred.

Upvotes: 3

Views: 52

Answers (1)

Artem Sokolov
Artem Sokolov

Reputation: 13731

min and cyl can be easily handled by ensym(), which works with both strings and symbols. The expression am == 1 requires a little bit more work. Let's define a helper function that parses an object only if it's a string:

str2expr <- function(.x) {if( is.character(.x) ) rlang::parse_expr(.x) else .x}

We can now capture the argument provided to subset_arg and parse it if it's a string:

summarise_filtered_dplyrish <-
    function(df,
             subset_arg,
             summary_fun,
             select_col) {

        subset_expr <- rlang::enexpr(subset_arg) %>% str2expr()

        df %>%
            filter( !!subset_expr ) %>%
            summarise(across(.cols = {{select_col}}, .fns = !!ensym(summary_fun))) %>%
            pull( !!ensym(select_col) )
    }

summarise_filtered_dplyrish( mtcars, am == 1, min, cyl )        # Works
summarise_filtered_dplyrish( mtcars, "am == 1", "min", "cyl" )  # Also works

Brief explanation: {{x}} is shorthand for !!enquo(x) which captures the expression provided to the function argument and the context where that expression should be evaluated. Since, your context is effectively defined by df, it's OK to relax enquo down to enexpr (which captures expressions but not the evaluation context) and ensym (which captures symbols or strings containing names of symbols).

Upvotes: 2

Related Questions