Till
Till

Reputation: 707

Using tidy eval for multiple, arbitrary filter conditions

I would like to use tidy evaluation to write multiple, entirely flexible filter conditions. A related, but less complex problem has been solved in this Stackoverflow Question. The following code (which is an adaption from the mentioned other question) is working. It applies two filter conditions to the gapminder data set, and returns the filtered data.

library(tidyverse)
library(gapminder)

my_filter <- function(df, cols, vals){    
  paste_filter <- function(x, y) quo(!!sym(x) %in% {{y}})
  fp <- pmap(list(cols, vals), paste_filter)
  filter(df, !!!fp)
}

cols <- list("country", "year")
vals = list(c("Albania", "France"), c(2002, 2007))
gapminder %>% my_filter(cols, vals) 

The problem: So far this solution is restricted to one type of filter operator (%in%). I would like to extend this approach to accept arbitrary types of operators (==, %in%, >, ...). The intended function my_filter is supposed to handle the following:

cols <- list("country", "year")
ops <- list("%in%", ">=")
vals = list(c("Albania", "France"), 2007))
gapminder %>% my_filter(cols, ops, vals)

The use case that I have in the back of my mind are shiny apps. Using such a functionality, we could more easily let users set arbitrary filter conditions on variables of the data set.

Upvotes: 1

Views: 247

Answers (1)

Lionel Henry
Lionel Henry

Reputation: 6803

Create a list of calls and splice them in:

library(dplyr)
library(gapminder)

cols <- list("country", "year")
ops <- list("%in%", ">=")
vals <- list(c("Albania", "France"), 2007)

# Assumes LHS is the name of a variable and OP is
# the name of a function
op_call <- function(op, lhs, rhs) {
  call(op, sym(lhs), rhs)
}

my_filter <- function(data, cols, ops, vals) {
  exprs <- purrr::pmap(list(ops, cols, vals), op_call)
  data %>% dplyr::filter(!!!exprs)
}

gapminder %>% my_filter(cols, ops, vals)
#> # A tibble: 2 × 6
#>   country continent  year lifeExp      pop gdpPercap
#>   <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
#> 1 Albania Europe     2007    76.4  3600523     5937.
#> 2 France  Europe     2007    80.7 61083916    30470.

Here we don't have to worry about scoping issues because (a) the column names are assumed to be defined in the data mask, (b) the values are passed by value and inlined in the created calls, and (c) the functions are assumed to be binary operators and these are rarely redefined.

To allow custom user functions, there are two ways we could go about it. First, we could take an environment and create quosures manually with new_quosure():

op_call <- function(op, lhs, rhs, env = caller_env()) {
  new_quosure(call(op, sym(lhs), rhs), env)
}

my_filter <- function(data, cols, ops, vals, env = caller_env()) {
  exprs <- purrr::pmap(list(ops, cols, vals), op_call, env)
  data %>% dplyr::filter(!!!exprs)
}

gapminder %>% my_filter(cols, ops, vals)

local({
  my_op <- `%in%`
  gapminder %>% my_filter(cols, list("my_op", ">="), vals)
})
#> # A tibble: 2 × 6
#>   country continent  year lifeExp      pop gdpPercap
#>   <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
#> 1 Albania Europe     2007    76.4  3600523     5937.
#> 2 France  Europe     2007    80.7 61083916    30470.

The other way, perhaps simpler, is to allow the call to contain inlined functions. To that end, use rlang::call2() instead of base::call():

op_call <- function(op, lhs, rhs) {
  call2(op, sym(lhs), rhs)
}

my_filter <- function(data, cols, ops, vals) {
  exprs <- purrr::pmap(list(ops, cols, vals), op_call)
  data %>% dplyr::filter(!!!exprs)
}

local({
  my_op <- `%in%`
  gapminder %>% my_filter(cols, list(my_op, ">="), vals)
})
#> # A tibble: 2 × 6
#>   country continent  year lifeExp      pop gdpPercap
#>   <fct>   <fct>     <int>   <dbl>    <int>     <dbl>
#> 1 Albania Europe     2007    76.4  3600523     5937.
#> 2 France  Europe     2007    80.7 61083916    30470.

The downside of inlining functions is that this will prevent optimisations and transportability to other dplyr backends.

Upvotes: 5

Related Questions