Reputation: 1446
I have a list that looks like this one:
$`264`
[1] "CHAMP1" "MAP1S" "PRRC1" "TUT1" "CDK12"
$`265`
[1] "TUT1" "PRRC1" "CHAMP1" "MAP1S"
$`266`
[1] "REPS1" "CHAMP1" "PRRC1" "TUT1" "MAP1S"
$`267`
[1] "G3BP1" "TUT1" "PRRC1" "CHAMP1" "MAP1S"
$`268`
[1] "TUT1" "CHAMP1" "PRRC1" "MAP1S"
$`269`
[1] "DDB1" "CHAMP1" "TUT1" "PRRC1" "MAP1S"
Is there any package
or function to calculate the similarity among the different list components?
Many thanks
Upvotes: 1
Views: 1122
Reputation: 59345
I'm not aware of any packages, but this implements your own metric (from your comment):
siml <- function(x,y) {
length(intersect(lst[[x]],lst[[y]]))/length(union(lst[[x]],lst[[y]]))
}
z <- expand.grid(x=1:length(lst),y=1:length(lst))
result <- mapply(siml,z$x,z$y)
dim(result) <- c(length(lst),length(lst))
result
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1.000 0.8 0.667 0.667 0.8 0.667
# [2,] 0.800 1.0 0.800 0.800 1.0 0.800
# [3,] 0.667 0.8 1.000 0.667 0.8 0.667
# [4,] 0.667 0.8 0.667 1.000 0.8 0.667
# [5,] 0.800 1.0 0.800 0.800 1.0 0.800
# [6,] 0.667 0.8 0.667 0.667 0.8 1.000
This is a (slightly) more efficient way to do the same thing:
result <- sapply(lst,function(x)
sapply(lst,function(y,x)length(intersect(x,y))/length(union(x,y)),x))
Upvotes: 1