MLEN
MLEN

Reputation: 2561

Filtering all columns depending on one column using dplyr

I would like to filter rows where at least one column, excluding P is bigger than P, using dplyr. Trying to figuring out a solution that filters on all columns.

Example

library(dplyr)

df <-  tibble(P = c(2,4,5,6,1.4), B = 
                c(2.1,3,5.5,1.2, 2), 
                C = c(2.2, 3.8, 5.7, 5,
                  1.5))

Desired output

df <- filter(df, B > P | C > P)
df

One solution using apply, which I would like to avoid if possible:

filter(df, apply(df, 1, function(x) sum(x > x[1]) > 1))

Upvotes: 0

Views: 451

Answers (2)

akrun
akrun

Reputation: 887951

Here is an option using tidyverse where we make use of the map and reduce functions from purrr to get a logical vector to extract (from magrittr) the rows of the original dataset

library(tidyverse)
library(magrittr)
df %>% 
    select(-one_of("P")) %>% 
    map(~ .> df$P) %>% 
    reduce(`|`) %>%
    extract(df, .,)
# A tibble: 3 × 3
#      P     B     C
#  <dbl> <dbl> <dbl>
#1   2.0   2.1   2.2
#2   5.0   5.5   5.7
#3   1.4   2.0   1.5

This can also be converted to a function using the devel version of dplyr (soon to be released 0.6.0) which introduced quosures and unquote for evaluation. The enquo is almost similar to substitute from base R which takes the user input and convert it to quosure, one_of takes string arguments, so it can be converted to string with quo_name

funFilter <- function(dat, colToCompare){
   colToCompare <- quo_name(enquo(colToCompare))

   dat %>%
       select(-one_of(colToCompare)) %>%
       map(~ .> dat[[colToCompare]]) %>%
       reduce(`|`) %>%
       extract(dat, ., )
}

funFilter(df, P)#compare all other columns with P
# A tibble: 3 × 3
#      P     B     C
#  <dbl> <dbl> <dbl>
#1   2.0   2.1   2.2
#2   5.0   5.5   5.7
#3   1.4   2.0   1.5

funFilter(df, B) #compare all other columns with B
# A tibble: 4 × 3
#      P     B     C
#  <dbl> <dbl> <dbl>
#1     2   2.1   2.2
#2     4   3.0   3.8
#3     5   5.5   5.7
#4     6   1.2   5.0

We can also parse the expression

v1 <- setdiff(names(df), "P")
filter(df, !!rlang::parse_quosure(paste(v1, "P", sep=" > ", collapse=" | ")))
# A tibble: 3 × 3
#     P     B     C
#    <dbl> <dbl> <dbl>
#1   2.0   2.1   2.2
#2   5.0   5.5   5.7
#3   1.4   2.0   1.5

This can also be made into a function

funFilter2 <- function(dat, colToCompare){
    colToCompare <- quo_name(enquo(colToCompare))
    v1 <- setdiff(names(dat), colToCompare)
    expr <- rlang::parse_quosure(paste(v1, colToCompare, sep= " > ", collapse= " | "))
    dat %>%
        filter(!!expr)
}

funFilter2(df, P)
# A tibble: 3 × 3
#      P     B     C
#  <dbl> <dbl> <dbl>
#1   2.0   2.1   2.2
#2   5.0   5.5   5.7
#3   1.4   2.0   1.5

funFilter2(df, B)
# A tibble: 4 × 3
#      P     B     C
#  <dbl> <dbl> <dbl>
#1     2   2.1   2.2
#2     4   3.0   3.8
#3     5   5.5   5.7
#4     6   1.2   5.0

Or another approach could be pmax

df %>%
   filter(do.call(pmax, .) > P)
# A tibble: 3 × 3
#      P     B     C
#   <dbl> <dbl> <dbl>
#1   2.0   2.1   2.2
#2   5.0   5.5   5.7
#3   1.4   2.0   1.5

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18435

Without dplyr...

df2 <- df[df$P!=apply(df,1,max),]

or with dplyr...

df3 <- df %>% filter(P!=apply(df,1,max))

Upvotes: 2

Related Questions