Reputation: 707
I would like to use tidy evaluation to write multiple, entirely flexible filter conditions. A related, but less complex problem has been solved in this Stackoverflow Question. The following code (which is an adaption from the mentioned other question) is working. It applies two filter conditions to the gapminder data set, and returns the filtered data.
library(tidyverse)
library(gapminder)
my_filter <- function(df, cols, vals){
paste_filter <- function(x, y) quo(!!sym(x) %in% {{y}})
fp <- pmap(list(cols, vals), paste_filter)
filter(df, !!!fp)
}
cols <- list("country", "year")
vals = list(c("Albania", "France"), c(2002, 2007))
gapminder %>% my_filter(cols, vals)
The problem: So far this solution is restricted to one type of filter operator (%in%
). I would like to extend this approach to accept arbitrary types of operators (==
, %in%
, >
, ...). The intended function my_filter
is supposed to handle the following:
cols <- list("country", "year")
ops <- list("%in%", ">=")
vals = list(c("Albania", "France"), 2007))
gapminder %>% my_filter(cols, ops, vals)
The use case that I have in the back of my mind are shiny apps. Using such a functionality, we could more easily let users set arbitrary filter conditions on variables of the data set.
Upvotes: 1
Views: 247
Reputation: 6803
Create a list of calls and splice them in:
library(dplyr)
library(gapminder)
cols <- list("country", "year")
ops <- list("%in%", ">=")
vals <- list(c("Albania", "France"), 2007)
# Assumes LHS is the name of a variable and OP is
# the name of a function
op_call <- function(op, lhs, rhs) {
call(op, sym(lhs), rhs)
}
my_filter <- function(data, cols, ops, vals) {
exprs <- purrr::pmap(list(ops, cols, vals), op_call)
data %>% dplyr::filter(!!!exprs)
}
gapminder %>% my_filter(cols, ops, vals)
#> # A tibble: 2 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Albania Europe 2007 76.4 3600523 5937.
#> 2 France Europe 2007 80.7 61083916 30470.
Here we don't have to worry about scoping issues because (a) the column names are assumed to be defined in the data mask, (b) the values are passed by value and inlined in the created calls, and (c) the functions are assumed to be binary operators and these are rarely redefined.
To allow custom user functions, there are two ways we could go about it. First, we could take an environment and create quosures manually with new_quosure()
:
op_call <- function(op, lhs, rhs, env = caller_env()) {
new_quosure(call(op, sym(lhs), rhs), env)
}
my_filter <- function(data, cols, ops, vals, env = caller_env()) {
exprs <- purrr::pmap(list(ops, cols, vals), op_call, env)
data %>% dplyr::filter(!!!exprs)
}
gapminder %>% my_filter(cols, ops, vals)
local({
my_op <- `%in%`
gapminder %>% my_filter(cols, list("my_op", ">="), vals)
})
#> # A tibble: 2 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Albania Europe 2007 76.4 3600523 5937.
#> 2 France Europe 2007 80.7 61083916 30470.
The other way, perhaps simpler, is to allow the call to contain inlined functions. To that end, use rlang::call2()
instead of base::call()
:
op_call <- function(op, lhs, rhs) {
call2(op, sym(lhs), rhs)
}
my_filter <- function(data, cols, ops, vals) {
exprs <- purrr::pmap(list(ops, cols, vals), op_call)
data %>% dplyr::filter(!!!exprs)
}
local({
my_op <- `%in%`
gapminder %>% my_filter(cols, list(my_op, ">="), vals)
})
#> # A tibble: 2 × 6
#> country continent year lifeExp pop gdpPercap
#> <fct> <fct> <int> <dbl> <int> <dbl>
#> 1 Albania Europe 2007 76.4 3600523 5937.
#> 2 France Europe 2007 80.7 61083916 30470.
The downside of inlining functions is that this will prevent optimisations and transportability to other dplyr backends.
Upvotes: 5