Reputation: 446
I'm trying to conditionally filter a data frame to extract the rows of interest. What I'm trying to do is different than generic conditional filtering in that it involves variable rules affecting the pairs of columns.
My reprex below simulates a data.frame
which involves 4 samples: Control
, Drug_1
, Drug_2
, and Drug_3
and pairwise comparisons among them (difference is shown as the p_value
). I'd like to use this piece of code in a function to potentially compare more than 4 groups. I tried combining the filtering criteria with OR
operators but I ended with a rather ugly code.
My end goal is obtaining a filtered_df
that shows all the rows in which variables group1
and group2
has the data pairs that is in my comparisons
list. Any help is appreciated!
Best, Atakan
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Make a mock data frame
gene <- "ABCD1"
group1 <- c("Control", "Control", "Control", "Drug_1", "Drug_1", "Drug_2")
group2 <- c("Drug_1", "Drug_2", "Drug_3", "Drug_2", "Drug_3", "Drug_3")
p_value <- c(0.4, 0.001, 0.003, 0.01, 0.3, 0.9)
df <- data.frame(gene, group1, group2, p_value)
df
#> gene group1 group2 p_value
#> 1 ABCD1 Control Drug_1 0.400
#> 2 ABCD1 Control Drug_2 0.001
#> 3 ABCD1 Control Drug_3 0.003
#> 4 ABCD1 Drug_1 Drug_2 0.010
#> 5 ABCD1 Drug_1 Drug_3 0.300
#> 6 ABCD1 Drug_2 Drug_3 0.900
# I'd like to filter rows when group1 and group2 matches the following pairs
comparisons <- list(c("Control", "Drug_1"), c("Control", "Drug_2"), c("Drug_2", "Drug_3"))
# I can filter by using one pair as follows:
filtered_df <- df %>%
filter(group1 == comparisons[[1]][1] & group2 == comparisons[[1]][2])
filtered_df
#> gene group1 group2 p_value
#> 1 ABCD1 Control Drug_1 0.4
Created on 2018-06-29 by the reprex package (v0.2.0).
Upvotes: 3
Views: 2489
Reputation: 887691
We can do this in a couple of ways.
1) One way is to loop through the list
('comparisons') and then do the filter
of the dataset individual and bind the output together (map_df
)
library(tidyverse)
map_df(comparisons, ~ df %>%
filter(group1 == .x[1] & group2 == .x[2]))
2) Another option is to convert the list
to a data.frame
and do an inner_join
with the first dataset
do.call(rbind, comparisons) %>% # rbind to a matrix
as.data.frame %>% # convert to a data.frame
set_names(c("group1", "group2")) %>% # change the column names
inner_join(df) # and inner join
3) Or using merge
from base R
(similar to 2)
merge(df, as.data.frame(do.call(rbind, comparisons)),
by.x = c("group1", "group2"), by.y = c("V1", "V2"))
Upvotes: 2