aldorado
aldorado

Reputation: 4844

Represent similarities between lists in R

I have a lot of lists containing different quantities of nominal elements. I want to compare every list with every other list and for every combination count how many elements the two lists share. I am no statistician but I imagine the outcome represented easiest in a matrix.

list1=["Joe","Hanna","Alice"]
list2=["Martin","Ted","Joe"]
list3=["Hanna","Ted","Joe"]

Similarities

Afterwards I would like to represent the outcome graphically, maybe using a heatmap or a cluster representation.

Can anybody give me some hints how to do this using R? What else would be a good representation? Thanks a lot!

Upvotes: 0

Views: 769

Answers (3)

TWL
TWL

Reputation: 2300

I would recommend using sapply in this case:

data <- list(list1=c("Joe","Hanna","Alice"), 
             list2=c("Martin","Ted","Joe"), 
             list3=c("Hanna","Ted","Joe"))

mat <- sapply(data, function(x) sapply(data, function(y) length(intersect(x,y))))

print(mat)

#       list1 list2 list3
# list1     3     1     2
# list2     1     3     2
# list3     2     2     3

See heatmap or heatmap.2 functions for a clustered representation, or you can try ggplot2 for a nicer visual output and legend with discrete colour coding:

# require(reshape2)
df <- melt(mat)

# require(ggplot2)
ggplot(data=df, aes(x=Var1, y=Var2)) + geom_tile(aes(fill=factor(value))) +
scale_fill_brewer(palette="Blues") +
theme(axis.title=element_blank(), legend.title=element_blank())

enter image description here

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can use crossprod, table, and stack (assuming your data is in the form that TWL shared):

data <- list(list1=c("Joe","Hanna","Alice"), 
             list2=c("Martin","Ted","Joe"), 
             list3=c("Hanna","Ted","Joe"))
crossprod(table(stack(data)))
#        ind
# ind     list1 list2 list3
#   list1     3     1     2
#   list2     1     3     2
#   list3     2     2     3

Wrap that in heatmap if you're looking for a heatmap :-)

Upvotes: 6

fabians
fabians

Reputation: 3473

intersect() returns the intersection of two sets:

list1 <- list("Joe","Hanna","Alice")
list2 <- list("Martin","Ted","Joe")
list3 <- list("Hanna","Ted","Joe")
list <- list(list1=list1, list2=list2, list3=list3)

result <- matrix(NA, length(list), length(list))
colnames(result) <- rownames(result) <- names(list)

for(i in 1:length(list)){
    for(j in i:length(list)){
        result[i, j] <- length(intersect(list[[i]], list[[j]]))    
        result[j, i] <- result[i, j]     
    }
}
result
 ##           list1 list2 list3
 ## list1     3     1     2
 ## list2     1     3     2
 ## list3     2     2     3

image(result) will give a nice graphical representation, for example.

Upvotes: 3

Related Questions