Reputation: 33
I have a list of pairwise comparisons of closely related isolates, one element of the list looks like this:
df <- data.frame(Isolate1 = c("s1", "s2", "s2"), Ward1.x = c("1_1", "3_3", "3_3"), Ward1.y = c("NA", "2_1", "2_1"), Ward1.z = c("NA", "NA", "NA"),
Isolate2 = c("s3", "s1", "s3"), Ward2.x = c("2_1", "1_1", "2_1"), Ward2.y = c("NA", "NA", "NA"), Ward2.z = c("NA", "NA", "NA"))
Isolate1 Ward1.x Ward1.y Ward1.z Isolate2 Ward2.x Ward2.y Ward2.z
1 s1 1_1 NA NA s3 2_1 NA NA
2 s2 3_3 2_1 NA s1 1_1 NA NA
3 s2 3_3 2_1 NA s3 2_1 NA NA
And I now want to find out, if there are pairwise comparisons where both isolates were sampled from the same ward. Hence, I want to check, if there are any duplicates between columns Ward1.x-Ward1.z and Ward2.x-Ward2.z.
So my output should be a dataframe that only includes the last row of df, as Ward1.y and Ward2.x are identical, like this:
Isolate1 Ward1.x Ward1.y Ward1.z Isolate2 Ward2.x Ward2.y Ward2.z
3 s2 3_3 2_1 NA s3 2_1 NA NA
I know how to subset a dataframe based on equality of two columns, but how can I check for equality between multiple columns?
Upvotes: 0
Views: 152
Reputation: 388807
One way using apply
would be to filter rows where there is atleast one non-NA value which is common in "Ward1"
and "Ward2"
columns.
col1 <- grep("^Ward1", names(df))
col2 <- grep("^Ward2", names(df))
df[apply(df, 1, function(x) any(na.omit(x[col1]) %in% na.omit(x[col2]))), ]
# Isolate1 Ward1.x Ward1.y Ward1.z Isolate2 Ward2.x Ward2.y Ward2.z
#3 s2 3_3 2_1 NA s3 2_1 NA NA
Using similar logic with dplyr
and tidyr
we can do
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
gather(key, value, -starts_with("Iso"), -row) %>%
group_by(row) %>%
filter(any(na.omit(value[grep("Ward1", key)]) %in%
na.omit(value[grep("Ward2", key)]))) %>%
spread(key, value)
# A tibble: 1 x 9
# Groups: row [1]
# Isolate1 Isolate2 row Ward1.x Ward1.y Ward1.z Ward2.x Ward2.y Ward2.z
# <chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 s2 s3 3 3_3 2_1 NA 2_1 NA NA
We can also use intersect
df[apply(df, 1, function(x) length(na.omit(intersect(x[col1], x[col2])))) > 0, ]
Upvotes: 1