Reputation: 5088
Assume you have an arbitrary number of vectors. Now you want to compare which elements co-occur between which vectors. For a small number of vectors this is easy to do "manually", e.g.:
a <- c("a", "b", "c")
b <- c("d", "e", "f")
c <- c("g", "h", "i")
a %in% b
a %in% c
b %in% c
However, as the number of vectors grow, this quickly becomes unwieldy. Is there some nifty and generalizable solution to these kinds of comparisons?
Upvotes: 3
Views: 1630
Reputation: 44525
Start by putting all of your vectors in a list, which will make them easier to work with. I imagine you then just want to know if each element of each vector appears in any of the other vectors. You can do that with a simple leave-one-out comparison of each vector to all the other vectors in the list:
x <- list(a, b, c)
lapply(seq_along(x), function(n) x[[n]] %in% unlist(x[-n]))
# [[1]]
# [1] FALSE FALSE FALSE
#
# [[2]]
# [1] FALSE FALSE FALSE
#
# [[3]]
# [1] FALSE FALSE FALSE
In the above structure, each vector is compared against all other values in all other vectors (combined). So the first list element is a three-element vector indicating whether each element of a
is found anywhere in b
or c
, and so forth.
If you need to do every pairwise comparison of vectors, you can do:
apply(combn(seq_along(x), 2), 2, function(n) x[[n[1]]] %in% x[[n[2]]])
# [,1] [,2] [,3]
# [1,] FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE
# [3,] FALSE FALSE FALSE
In this structure, each column relates to a comparison of the vectors given by combn(seq_along(x), 2)
:
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 3 3
So the first column indicates whether each element of a
is found in b
, the second column indicates whether each element of a
is found in c
, etc.
Upvotes: 3