Comparing multiple vectors

Question

Assume you have an arbitrary number of vectors. Now you want to compare which elements co-occur between which vectors. For a small number of vectors this is easy to do "manually", e.g.:

a <- c("a", "b", "c")
b <- c("d", "e", "f")
c <- c("g", "h", "i")

a %in% b
a %in% c
b %in% c

However, as the number of vectors grow, this quickly becomes unwieldy. Is there some nifty and generalizable solution to these kinds of comparisons?

Thomas · Accepted Answer

Start by putting all of your vectors in a list, which will make them easier to work with. I imagine you then just want to know if each element of each vector appears in any of the other vectors. You can do that with a simple leave-one-out comparison of each vector to all the other vectors in the list:

x <- list(a, b, c)
lapply(seq_along(x), function(n) x[[n]] %in% unlist(x[-n]))
# [[1]]
# [1] FALSE FALSE FALSE
# 
# [[2]]
# [1] FALSE FALSE FALSE
# 
# [[3]]
# [1] FALSE FALSE FALSE

In the above structure, each vector is compared against all other values in all other vectors (combined). So the first list element is a three-element vector indicating whether each element of a is found anywhere in b or c, and so forth.

If you need to do every pairwise comparison of vectors, you can do:

apply(combn(seq_along(x), 2), 2, function(n) x[[n[1]]] %in% x[[n[2]]])
#       [,1]  [,2]  [,3]
# [1,] FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE
# [3,] FALSE FALSE FALSE

In this structure, each column relates to a comparison of the vectors given by combn(seq_along(x), 2):

     [,1] [,2] [,3]
[1,]    1    1    2
[2,]    2    3    3

So the first column indicates whether each element of a is found in b, the second column indicates whether each element of a is found in c, etc.

Comparing multiple vectors

Answers (1)

Related Questions