Darshan
Darshan

Reputation: 25

Compare a list with a multi list in R

I have a multi list q like this

[[1]]
[1]   1   2   3   4   5   6  10  12  15  20  32  49  57  74 100

[[2]]
[1]  1  2  3 13 27

[[3]]
[1]  4 21 73

[[4]]
[1]  1  2  3  4 11 25 28 42

[[5]]
[1]  1  2  3  4 26

[[6]]
[1]  1  2  3 11

and I have another list d

[1]  5 11 14 18 38 61

Now how do I compare d with all the list elements in q? I need something like this length(intersect(q,d)) should return list of length(q). The number of terms between d and each list in q. As the length of q is around a million. What is the efficient way of implementing? Edit: the desired output should be like :

1 0 0 1 0 1

As there is only one common item between d and q[[1]], q[[4]], q[[6]] the output is 1.

Upvotes: 0

Views: 120

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99331

You can use vapply

vapply(q, function(x) length(intersect(x, d)), 1L)
# [1] 1 0 0 1 0 1

Not sure, but it might be faster to do

vapply(q, function(x) sum(x %in% d), 1L)
# [1] 1 0 0 1 0 1

... And it turns out that it is considerably faster to use sum(x %in% d) :

qq <- rep(q, 1e4)
length(qq)
# [1] 60000

f <- function() vapply(qq, function(x) length(intersect(x, d)), 1L)
g <- function() vapply(qq, function(x) sum(x %in% d), 1L)

library(microbenchmark)
microbenchmark(f(), g(), times = 10, unit = "relative")
# Unit: relative
#  expr    min       lq     mean   median       uq      max neval cld
#   f() 8.4694 8.466754 8.311812 8.557292 8.447665 7.095008    10   b
#   g() 1.0000 1.000000 1.000000 1.000000 1.000000 1.000000    10  a 


identical(f(), g())
# [1] TRUE

Where the original q list is

q <- list(c(1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 32, 49, 57, 74, 100), 
          c(1, 2, 3, 13, 27), c(4, 21, 73), c(1, 2, 3, 4, 11, 25, 28, 42),  
          c(1, 2, 3, 4, 26), c(1, 2, 3, 11))

Upvotes: 1

Related Questions