Forest
Forest

Reputation: 721

Plot the intersection in every two list elements

Given a list of 16 elements, where each element is a named numeric vector, I want to plot the length of the intersection of names between every 2 elements. That is; the intersection of element 1 with element 2, that of element 3 with element 4, etc.

Although I can do this in a very tedious, low-throughput manner, I'll have to repeat this sort of analysis, so I'd like a more programmatic way of doing it.

As an example, the first 5 entries of the first 2 list elements are:

topGenes[[1]][1:5]

3398   284353   219293     7450    54658 
2.856363 2.654106 2.653845 2.635599 2.626518 

topGenes[[2]][1:5]
1300    64581     2566     5026   146433 
2.932803 2.807381 2.790484 2.739735 2.705030 

Here, the first row of numbers are gene IDs & I want to know how many each pair of vectors (a treatment replicate) have in common, among, say, the top 100.

I've tried using lapply() in the following manner:

vectorOfIntersectLengths <- lapply(topGenes, function(x) lapply(topGenes, function(y) length(intersect(names(x)[1:100],names(y)[1:100]))))

This only seems to operate on the first two elements; topGenes[[1]] & topGenes[[2]].

I've also been trying to do this with a for() loop, but I'm unsure how to write this. Something along the lines of this:

lengths <- c()
for(i in 1:length(topGenes)){
  lens[i] <- length(intersect(names(topGenes[[i]][1:200]),
names(topGenes[[i+1]][1:200])))
}

This returns a 'subscript out of bounds' error, which I don't really understand.

Thanks a lot for any help!

Upvotes: 2

Views: 550

Answers (1)

C8H10N4O2
C8H10N4O2

Reputation: 18995

Is this what you're looking for?

# make some fake data
set.seed(123)
some_list <- lapply(1:16, function(x) {
  y <- rexp(100)
  names(y) <- sample.int(1000,100)
  y
})

# identify all possible pairs
pairs <- t( combn(length(some_list), 2) )
# note: you could also use:  pairs <- expand.grid(1:length(some_list),1:length(some_list))
# but in addition to a-to-b, you'd get b-to-a, a-to-a, and b-to-b

# get the intersection of names of a pair of elements with given indices kept for bookkeeping
get_intersection <- function(a,b) {
  list(a = a, b = b, 
       intersection = intersect( names(some_list[[a]]), names(some_list[[b]]) ) 
  )
}

# get intersection for each pair
intersections <- mapply(get_intersection, a = pairs[,1], b = pairs[,2], SIMPLIFY=FALSE)

# print the intersections
for(indx in 1:length(intersections)){
  writeLines(paste('Intersection of', intersections[[indx]]$a, 'and',
                   intersections[[indx]]$b, 'contains:', 
                   paste( sort(intersections[[indx]]$intersection), collapse=', ') ) )
}

Upvotes: 1

Related Questions