Reputation: 291
I am working on filtering a data frame using dplyr. The problem is that the predicates differs between columns.
Please find below a minimal example with three columns and three predicates:
library(tidyverse)
set.seed(123)
dframe <- rerun(3, rnorm(5)) %>%
set_names(paste0("var", 1:3)) %>%
data.frame
cond <- c(2, 1, -1.4)
dframe %>% filter(var1 < cond[1] & var2 < cond[2] & var3 > cond[3])
Is there any way to filter the data set without explicitly stating the predicates in filter
?
Edit: A potential solution to the problem is obviously using a for-loop, see the code below. However, there might be more elegant solutions.
dframe_help <- dframe
cond <- c(2, 1, -1.4)
isSmaller <- c(TRUE, TRUE, FALSE)
for(i in seq_along(cond)) {
if (isSmaller[i])
dframe_help <- dframe_help %>% filter_at(.vars = vars(num_range(prefix = "var", range = i)),
.vars_predicate = all_vars(. < cond[i]))
else
dframe_help <- dframe_help %>% filter_at(.vars = vars(num_range(prefix = "var", range = i)),
.vars_predicate = all_vars(. > cond[i]))
}
Upvotes: 1
Views: 618
Reputation: 28675
You need some sort of object to specify whether to use <
or >
. I've created one called less
, which is 1
for <
and 0 for >
.
require(purrr); require(magrittr)
filter2 <- function(dframe, cond, less){
rows <- pmap(list(cond, less, dframe),
function(cond, less, x) if(less) x < cond else x > cond
) %>%
pmap_lgl(all)
dframe[rows,]
}
dframe %>% filter2(cond = c(2, 1, -1.4), less = c(1, 1, 0))
Or, explicitly pass the function you want to use for each variable.
filter3 <- function(df, y, fun){
df[pmap(list(df, y, fun), function(x, y, fun) fun(x, y)) %>%
pmap_lgl(all)
,]
}
dframe %>% filter3(y = c(2, 1, -1.4), fun = list(`<`, `<`, `>`))
Upvotes: 3
Reputation: 478
Not sure what you mean by 'automating' this process, but here are a couple of options.
If you want to filter along multiple features with some extra clarity, you can create a standalone filtering function:
cond <- c(2, 1, -1.4)
filter_using_conditions <- function(df) {
df[df$var1 < cond[1] & df$var2 < cond[2] & df$var3 > cond[3],]
}
dframe %>%
filter_using_conditions()
var1 var2 var3
2 0.4978505 -0.2179749 0.8377870
3 -1.9666172 -1.0260044 0.1533731
4 0.7013559 -0.7288912 -1.1381369
5 -0.4727914 -0.6250393 1.2538149
If you want implement a solution using vectors of operators and values, you can try doing some string manipulation and use base::eval()
or glue::eval()
to generate a logical vector for subsetting your dataframe. Here's an example using purrr:map
and purrr:map2
(it's not very elegant but hopefully gets the point across):
cond <- c(2, 1, -1.4)
operators <- c("<", "<", ">")
filter_conditions <- function(dframe, conds, operators) {
x <- paste(operators, conds, sep = " ")
rows_to_use <- map2(dframe, x, paste) %>%
map(map_lgl, glue::evaluate, NULL) %>%
as_tibble() %>%
na_if(FALSE) %>%
complete.cases()
dframe[rows_to_use,]
}
filter_conditions(dframe, cond, operators)
var1 var2 var3
2 0.4978505 -0.2179749 0.8377870
3 -1.9666172 -1.0260044 0.1533731
4 0.7013559 -0.7288912 -1.1381369
5 -0.4727914 -0.6250393 1.2538149
This example uses purrr:map2()
to generate individual strings for each datapoint using the specified operator-condition pairings, and then uses glue::evaluate()
and purrr:map2()
to execute those strings as commands and return logical vectors. dplyr::na_if()
is used so you can later use complete.cases()
to get a logical vector corresponding to row indices.
map2(dframe, x, paste)
$var1
[1] "1.78691313680308 < 2" "0.497850478229239 < 2" "-1.96661715662964 < 2" "0.701355901563686 < 2"
[5] "-0.472791407727934 < 2"
$var2
[1] "-1.06782370598685 < 1" "-0.217974914658295 < 1" "-1.02600444830724 < 1" "-0.72889122929114 < 1"
[5] "-0.625039267849257 < 1"
$var3
[1] "-1.68669331074241 > -1.4" "0.837787044494525 > -1.4" "0.153373117836515 > -1.4"
[4] "-1.13813693701195 > -1.4" "1.25381492106993 > -1.4"
Upvotes: 0