similarity index in a list of character vectors

Question

I have a list that looks like this one:

$`264`
[1] "CHAMP1" "MAP1S"  "PRRC1"  "TUT1"   "CDK12" 

$`265`
[1] "TUT1"   "PRRC1"  "CHAMP1" "MAP1S"

$`266`
[1] "REPS1"  "CHAMP1" "PRRC1"  "TUT1"   "MAP1S" 

$`267`
[1] "G3BP1"  "TUT1"   "PRRC1"  "CHAMP1" "MAP1S" 

$`268`
[1] "TUT1"   "CHAMP1" "PRRC1"  "MAP1S"  

$`269`
[1] "DDB1"   "CHAMP1" "TUT1"   "PRRC1"  "MAP1S"

Is there any package or function to calculate the similarity among the different list components?

Many thanks

jlhoward · Accepted Answer

I'm not aware of any packages, but this implements your own metric (from your comment):

siml  <- function(x,y) {
  length(intersect(lst[[x]],lst[[y]]))/length(union(lst[[x]],lst[[y]]))
}
z      <- expand.grid(x=1:length(lst),y=1:length(lst))
result <- mapply(siml,z$x,z$y)
dim(result) <- c(length(lst),length(lst))
result
#       [,1] [,2]  [,3]  [,4] [,5]  [,6]
# [1,] 1.000  0.8 0.667 0.667  0.8 0.667
# [2,] 0.800  1.0 0.800 0.800  1.0 0.800
# [3,] 0.667  0.8 1.000 0.667  0.8 0.667
# [4,] 0.667  0.8 0.667 1.000  0.8 0.667
# [5,] 0.800  1.0 0.800 0.800  1.0 0.800
# [6,] 0.667  0.8 0.667 0.667  0.8 1.000

This is a (slightly) more efficient way to do the same thing:

result <- sapply(lst,function(x) 
            sapply(lst,function(y,x)length(intersect(x,y))/length(union(x,y)),x))

similarity index in a list of character vectors

Answers (1)

Related Questions