Reputation: 13
I have multiple lists of genes, for example:
listA <- c("geneA", "geneB", "geneC")
listB <- c("geneA", "geneB", "geneD", "geneE")
listC <- c("geneB", "geneF")
...
I'd like to get a table to show the # of overlapping elements between the lists, like:
listA listB listC ...
listA 3 2 1
listB 2 4 1
listC 1 1 2
...
I know how to get the # of overlaps between each pair, like length(intersect(listA, listB))
. But what are the easier ways to generate the overlap table?
Upvotes: 1
Views: 2222
Reputation: 26343
Here is a way in base R
crossprod(table(stack(mget(ls(pattern = "^list")))))
# ind
#ind listA listB listC
# listA 3 2 1
# listB 2 4 1
# listC 1 1 2
mget(ls(pattern = "^list"))
will give you a list of elements from your global environment whose names begin with "list".
stack
will turn this list into the following data frame
stack(mget(ls(pattern = "^list")))
# values ind
#1 geneA listA
#2 geneB listA
#3 geneC listA
#4 geneA listB
#5 geneB listB
#6 geneD listB
#7 geneE listB
#8 geneB listC
#9 geneF listC
Calling table
returns.
out <- table(stack(mget(ls(pattern = "^list"))))
out
# ind
#values listA listB listC
# geneA 1 1 0
# geneB 1 1 1
# geneC 1 0 0
# geneD 0 1 0
# geneE 0 1 0
# geneF 0 0 1
crossprod
then calculates
t(out) %*% out
which returns
# ind
#ind listA listB listC
# listA 3 2 1
# listB 2 4 1
# listC 1 1 2
Upvotes: 3
Reputation: 28685
Create a list of all objects
list.all <- list(listA, listB, listC)
use outer
outer(list.all, list.all, Vectorize(function(x, y) sum(x %in% y)))
# [,1] [,2] [,3]
# [1,] 3 2 1
# [2,] 2 4 1
# [3,] 1 1 2
or use sapply
sapply(list.all, function(x) sapply(list.all, function(y) sum(y %in% x)))
# [,1] [,2] [,3]
# [1,] 3 2 1
# [2,] 2 4 1
# [3,] 1 1 2
Upvotes: 3