Reputation: 4844
I have a lot of lists containing different quantities of nominal elements. I want to compare every list with every other list and for every combination count how many elements the two lists share. I am no statistician but I imagine the outcome represented easiest in a matrix.
list1=["Joe","Hanna","Alice"]
list2=["Martin","Ted","Joe"]
list3=["Hanna","Ted","Joe"]
Afterwards I would like to represent the outcome graphically, maybe using a heatmap or a cluster representation.
Can anybody give me some hints how to do this using R? What else would be a good representation? Thanks a lot!
Upvotes: 0
Views: 769
Reputation: 2300
I would recommend using sapply
in this case:
data <- list(list1=c("Joe","Hanna","Alice"),
list2=c("Martin","Ted","Joe"),
list3=c("Hanna","Ted","Joe"))
mat <- sapply(data, function(x) sapply(data, function(y) length(intersect(x,y))))
print(mat)
# list1 list2 list3
# list1 3 1 2
# list2 1 3 2
# list3 2 2 3
See heatmap
or heatmap.2
functions for a clustered representation, or you can try ggplot2
for a nicer visual output and legend with discrete colour coding:
# require(reshape2)
df <- melt(mat)
# require(ggplot2)
ggplot(data=df, aes(x=Var1, y=Var2)) + geom_tile(aes(fill=factor(value))) +
scale_fill_brewer(palette="Blues") +
theme(axis.title=element_blank(), legend.title=element_blank())
Upvotes: 1
Reputation: 193517
You can use crossprod
, table
, and stack
(assuming your data is in the form that TWL shared):
data <- list(list1=c("Joe","Hanna","Alice"),
list2=c("Martin","Ted","Joe"),
list3=c("Hanna","Ted","Joe"))
crossprod(table(stack(data)))
# ind
# ind list1 list2 list3
# list1 3 1 2
# list2 1 3 2
# list3 2 2 3
Wrap that in heatmap
if you're looking for a heatmap :-)
Upvotes: 6
Reputation: 3473
intersect()
returns the intersection of two sets:
list1 <- list("Joe","Hanna","Alice")
list2 <- list("Martin","Ted","Joe")
list3 <- list("Hanna","Ted","Joe")
list <- list(list1=list1, list2=list2, list3=list3)
result <- matrix(NA, length(list), length(list))
colnames(result) <- rownames(result) <- names(list)
for(i in 1:length(list)){
for(j in i:length(list)){
result[i, j] <- length(intersect(list[[i]], list[[j]]))
result[j, i] <- result[i, j]
}
}
result
## list1 list2 list3
## list1 3 1 2
## list2 1 3 2
## list3 2 2 3
image(result)
will give a nice graphical representation, for example.
Upvotes: 3