Reputation: 25
I have a multi list q
like this
[[1]]
[1] 1 2 3 4 5 6 10 12 15 20 32 49 57 74 100
[[2]]
[1] 1 2 3 13 27
[[3]]
[1] 4 21 73
[[4]]
[1] 1 2 3 4 11 25 28 42
[[5]]
[1] 1 2 3 4 26
[[6]]
[1] 1 2 3 11
and I have another list d
[1] 5 11 14 18 38 61
Now how do I compare d
with all the list elements in q
?
I need something like this length(intersect(q,d))
should return list of length(q)
. The number of terms between d
and each list in q
.
As the length of q
is around a million. What is the efficient way of implementing?
Edit: the desired output should be like :
1 0 0 1 0 1
As there is only one common item between d
and q[[1]]
, q[[4]]
, q[[6]]
the output is 1
.
Upvotes: 0
Views: 120
Reputation: 99331
You can use vapply
vapply(q, function(x) length(intersect(x, d)), 1L)
# [1] 1 0 0 1 0 1
Not sure, but it might be faster to do
vapply(q, function(x) sum(x %in% d), 1L)
# [1] 1 0 0 1 0 1
... And it turns out that it is considerably faster to use sum(x %in% d)
:
qq <- rep(q, 1e4)
length(qq)
# [1] 60000
f <- function() vapply(qq, function(x) length(intersect(x, d)), 1L)
g <- function() vapply(qq, function(x) sum(x %in% d), 1L)
library(microbenchmark)
microbenchmark(f(), g(), times = 10, unit = "relative")
# Unit: relative
# expr min lq mean median uq max neval cld
# f() 8.4694 8.466754 8.311812 8.557292 8.447665 7.095008 10 b
# g() 1.0000 1.000000 1.000000 1.000000 1.000000 1.000000 10 a
identical(f(), g())
# [1] TRUE
Where the original q
list is
q <- list(c(1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 32, 49, 57, 74, 100),
c(1, 2, 3, 13, 27), c(4, 21, 73), c(1, 2, 3, 4, 11, 25, 28, 42),
c(1, 2, 3, 4, 26), c(1, 2, 3, 11))
Upvotes: 1