Nussig
Nussig

Reputation: 291

Filter each column with a different condition

I am working on filtering a data frame using dplyr. The problem is that the predicates differs between columns.

Please find below a minimal example with three columns and three predicates:

library(tidyverse)

set.seed(123)
dframe <- rerun(3, rnorm(5)) %>%
  set_names(paste0("var", 1:3)) %>% 
  data.frame

cond <- c(2, 1, -1.4)
dframe %>% filter(var1 < cond[1] & var2 < cond[2] & var3 > cond[3])

Is there any way to filter the data set without explicitly stating the predicates in filter?

Edit: A potential solution to the problem is obviously using a for-loop, see the code below. However, there might be more elegant solutions.

dframe_help <- dframe
cond <- c(2, 1, -1.4)
isSmaller <- c(TRUE, TRUE, FALSE)
for(i in seq_along(cond)) {
  if (isSmaller[i])
    dframe_help <- dframe_help %>% filter_at(.vars = vars(num_range(prefix = "var", range = i)), 
                                             .vars_predicate = all_vars(. < cond[i]))
  else
    dframe_help <- dframe_help %>% filter_at(.vars = vars(num_range(prefix = "var", range = i)), 
                                            .vars_predicate = all_vars(. > cond[i]))
}

Upvotes: 1

Views: 618

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28675

You need some sort of object to specify whether to use < or >. I've created one called less, which is 1 for < and 0 for >.

require(purrr); require(magrittr)
filter2 <- function(dframe, cond, less){
            rows <- pmap(list(cond, less, dframe), 
                         function(cond, less, x) if(less) x < cond else x > cond
                         ) %>% 
                        pmap_lgl(all)
            dframe[rows,]
}

dframe %>% filter2(cond = c(2, 1, -1.4), less = c(1, 1, 0))

Or, explicitly pass the function you want to use for each variable.

filter3 <- function(df, y, fun){
        df[pmap(list(df, y, fun), function(x, y, fun) fun(x, y)) %>% 
                pmap_lgl(all)
        ,]
}


dframe %>% filter3(y = c(2, 1, -1.4), fun = list(`<`, `<`, `>`))

Upvotes: 3

anant
anant

Reputation: 478

Not sure what you mean by 'automating' this process, but here are a couple of options.

If you want to filter along multiple features with some extra clarity, you can create a standalone filtering function:

cond <- c(2, 1, -1.4)
filter_using_conditions <- function(df) {
  df[df$var1 < cond[1] & df$var2 < cond[2] & df$var3 > cond[3],]
}
dframe %>%
  filter_using_conditions()
        var1       var2       var3
2  0.4978505 -0.2179749  0.8377870
3 -1.9666172 -1.0260044  0.1533731
4  0.7013559 -0.7288912 -1.1381369
5 -0.4727914 -0.6250393  1.2538149

If you want implement a solution using vectors of operators and values, you can try doing some string manipulation and use base::eval() or glue::eval() to generate a logical vector for subsetting your dataframe. Here's an example using purrr:map and purrr:map2 (it's not very elegant but hopefully gets the point across):

cond <- c(2, 1, -1.4)
operators <- c("<", "<", ">")

filter_conditions <- function(dframe, conds, operators) {
  x <- paste(operators, conds, sep = " ")
  rows_to_use <- map2(dframe, x, paste) %>%
    map(map_lgl, glue::evaluate, NULL) %>%
    as_tibble() %>%
    na_if(FALSE) %>%
    complete.cases()
  dframe[rows_to_use,]
}
filter_conditions(dframe, cond, operators)
        var1       var2       var3
2  0.4978505 -0.2179749  0.8377870
3 -1.9666172 -1.0260044  0.1533731
4  0.7013559 -0.7288912 -1.1381369
5 -0.4727914 -0.6250393  1.2538149   

This example uses purrr:map2() to generate individual strings for each datapoint using the specified operator-condition pairings, and then uses glue::evaluate() and purrr:map2() to execute those strings as commands and return logical vectors. dplyr::na_if() is used so you can later use complete.cases() to get a logical vector corresponding to row indices.

map2(dframe, x, paste)
$var1
[1] "1.78691313680308 < 2"   "0.497850478229239 < 2"  "-1.96661715662964 < 2"  "0.701355901563686 < 2" 
[5] "-0.472791407727934 < 2"

$var2
[1] "-1.06782370598685 < 1"  "-0.217974914658295 < 1" "-1.02600444830724 < 1"  "-0.72889122929114 < 1" 
[5] "-0.625039267849257 < 1"

$var3
[1] "-1.68669331074241 > -1.4" "0.837787044494525 > -1.4" "0.153373117836515 > -1.4"
[4] "-1.13813693701195 > -1.4" "1.25381492106993 > -1.4" 

Upvotes: 0

Related Questions