question
question

Reputation: 497

In R, how can I plot a similarity matrix (like a block graph) after clustering data?

I want to produce a graph that shows a correlation between clustered data and similarity matrix. How can I do this in R? Is there any function in R that creates the graph like a picture in this link? http://bp0.blogger.com/_VCI4AaOLs-A/SG5H_jm-f8I/AAAAAAAAAJQ/TeLzUEWbb08/s400/Similarity.gif (just googled and got the link that shows a graph that I want to produce)

Thanks, in advance.

Upvotes: 3

Views: 15963

Answers (1)

Gavin Simpson
Gavin Simpson

Reputation: 174778

The general solutions suggested in the comments by @Chase and @bill_080 need a little bit of enhancement to (partially) fulfil the needs of the OP.

A reproducible example:

require(MASS)
set.seed(1)
dat <- data.frame(mvrnorm(100, mu = c(2,6,3), 
                          Sigma = matrix(c(10,   2,   4,
                                            2,   3, 0.5,
                                            4, 0.5,   2), ncol = 3)))

Compute the dissimilarity matrix of the standardised data using Eucildean distances

dij <- dist(scale(dat, center = TRUE, scale = TRUE))

and then calculate a hierarchical clustering of these data using the group average method

clust <- hclust(dij, method = "average")

Next we compute the ordering of the samples on basis of forming 3 ('k') groups from the dendrogram, but we could have chosen something else here.

ord <- order(cutree(clust, k = 3))

Next compute the dissimilarities between samples based on dendrogram, the cophenetic distances:

coph <- cophenetic(clust)

Here are 3 image plots of:

  1. The original dissimilarity matrix, sorted on basis of cluster analysis groupings,
  2. The cophenetic distances, again sorted as above
  3. The difference between the original dissimilarities and the cophenetic distances
  4. A Shepard-like plot comparing the original and cophenetic distances; the better the clustering at capturing the original distances the closer to the 1:1 line the points will lie

Here is the code that produces the above plots

layout(matrix(1:4, ncol = 2))
image(as.matrix(dij)[ord, ord], main = "Original distances")
image(as.matrix(coph)[ord, ord], main = "Cophenetic distances")
image((as.matrix(coph) - as.matrix(dij))[ord, ord], 
      main = "Cophenetic - Original")
plot(coph ~ dij, ylab = "Cophenetic distances", xlab = "Original distances",
     main = "Shepard Plot")
abline(0,1, col = "red")
box()
layout(1)

Which produces this on the active device:

plots of original and cophenetic distances

Having said all that, however, only the Shepard plot shows the "correlation between clustered data and [dis]similarity matrix", and that is not an image plot (levelplot). How would you propose to compute the correlation between two numbers for all pairwise comparisons of cophenetic and original [dis]similarities?

Upvotes: 13

Related Questions