Filtering with dplyr by using two separate selection criteria involving two columns

Question

I'm trying to conditionally filter a data frame to extract the rows of interest. What I'm trying to do is different than generic conditional filtering in that it involves variable rules affecting the pairs of columns.

My reprex below simulates a data.frame which involves 4 samples: Control, Drug_1, Drug_2, and Drug_3 and pairwise comparisons among them (difference is shown as the p_value). I'd like to use this piece of code in a function to potentially compare more than 4 groups. I tried combining the filtering criteria with OR operators but I ended with a rather ugly code.

My end goal is obtaining a filtered_df that shows all the rows in which variables group1 and group2 has the data pairs that is in my comparisons list. Any help is appreciated!

Best, Atakan

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Make a mock data frame
gene <- "ABCD1"
group1 <- c("Control", "Control", "Control", "Drug_1", "Drug_1", "Drug_2")
group2 <- c("Drug_1", "Drug_2", "Drug_3", "Drug_2", "Drug_3", "Drug_3")
p_value <- c(0.4, 0.001, 0.003, 0.01, 0.3, 0.9)

df <- data.frame(gene, group1, group2, p_value)
df
#>    gene  group1 group2 p_value
#> 1 ABCD1 Control Drug_1   0.400
#> 2 ABCD1 Control Drug_2   0.001
#> 3 ABCD1 Control Drug_3   0.003
#> 4 ABCD1  Drug_1 Drug_2   0.010
#> 5 ABCD1  Drug_1 Drug_3   0.300
#> 6 ABCD1  Drug_2 Drug_3   0.900

# I'd like to filter rows when group1 and group2 matches the following pairs
comparisons <- list(c("Control", "Drug_1"), c("Control", "Drug_2"), c("Drug_2", "Drug_3"))


# I can filter by using one pair as follows:
filtered_df <- df %>%
  filter(group1 == comparisons[[1]][1] & group2 == comparisons[[1]][2])

filtered_df
#>    gene  group1 group2 p_value
#> 1 ABCD1 Control Drug_1     0.4

Created on 2018-06-29 by the reprex package (v0.2.0).

akrun · Accepted Answer

We can do this in a couple of ways.

1) One way is to loop through the list ('comparisons') and then do the filter of the dataset individual and bind the output together (map_df)

library(tidyverse)
map_df(comparisons, ~ df %>%
                         filter(group1 == .x[1] & group2 == .x[2]))

2) Another option is to convert the list to a data.frame and do an inner_join with the first dataset

do.call(rbind, comparisons) %>% # rbind to a matrix
         as.data.frame %>% # convert to a data.frame
         set_names(c("group1", "group2")) %>% # change the column names
         inner_join(df) # and inner join

3) Or using merge from base R (similar to 2)

merge(df, as.data.frame(do.call(rbind, comparisons)),
            by.x = c("group1", "group2"), by.y = c("V1", "V2"))

Filtering with dplyr by using two separate selection criteria involving two columns

Answers (1)

Related Questions