blue-sky
blue-sky

Reputation: 53896

How to understand this dendogram

The values in this similarity matrix are based on jaccards coefficient :

    a,  b,  c
a,  1, .3, .6
b, .3,  1, .9
c, .6, .9,  1

To generate a cluster analysis I used this code :

tb = read.csv("c:\\Users\\Adrian\\Desktop\\sim-matrix.csv", row.names=1);
d  = as.dist(tb);
hclust(d);
plot(hclust(d, method="average"));

Which generates this dendogram :

enter image description here

?hclust does not provide any details

Upvotes: 1

Views: 267

Answers (2)

embert
embert

Reputation: 7602

I dont know, what d = as.dist(tb); does, but I think hclust(d, method="average") assumes d to be a distance matrix.

Why are a & b grouped close together

If you provide a similarity matrix the low similarity of .3 between a and b is interpreted as a low distance, thus a high similarity. That would explain why a and b are grouped first.

How is closeness measured?

Since you provided the similarity matrix, I think you are referring to how the closeness of clusters is measured when using average linkage. Assuming that the first point is appropriate, average linkage (I think in hclust average is WPGMA) takes the average similarities between all observations in distinct clusters. Lets check that:

Step 1:
Average similarities

  • a-b: .3
  • a-c: .6
  • c-b: .9

So we merge a and b at .3

Step 2:
Average similarities

  • ab-c: (.6 + .9) / (2*1) = 1.5 / 2 = .75

So merging ab-c should be at .75. Well, either the calculation of mine is wrong or the dendrogram corresponds to complete linkage.

Upvotes: 1

plannapus
plannapus

Reputation: 18759

The problem is that you never say at any point to your code that this is a similarity index. In fact you specifically say the opposite: as.dist(d). hclust takes a matrix of distance, i. e. dissimilarity. The simplest way to go for you is:

tb <- matrix(c(1,.3,.6,.3,1,.9,.6,.9,1),nrow=3)
tb <- 1-tb #Similarity to dissimilarity
d <- as.dist(tb)
plot(hclust(d))

Closeness (as you asked) was measured when you measured your Jaccard index.

Upvotes: 0

Related Questions