Reputation: 53896

How to understand this dendogram

The values in this similarity matrix are based on jaccards coefficient :

    a,  b,  c
a,  1, .3, .6
b, .3,  1, .9
c, .6, .9,  1

To generate a cluster analysis I used this code :

tb = read.csv("c:\\Users\\Adrian\\Desktop\\sim-matrix.csv", row.names=1);
d  = as.dist(tb);
hclust(d);
plot(hclust(d, method="average"));

Which generates this dendogram :

enter image description here

Why are a & b grouped close together
How is closeness measured ?
Does the agglomeration method "average", average the corresponding values for a , b & c ?

?hclust does not provide any details

Upvotes: 1

Answers (2)

embert

Reputation: 7602

I dont know, what d = as.dist(tb); does, but I think hclust(d, method="average") assumes d to be a distance matrix.

Why are a & b grouped close together

If you provide a similarity matrix the low similarity of .3 between a and b is interpreted as a low distance, thus a high similarity. That would explain why a and b are grouped first.

How is closeness measured?

Since you provided the similarity matrix, I think you are referring to how the closeness of clusters is measured when using average linkage. Assuming that the first point is appropriate, average linkage (I think in hclust average is WPGMA) takes the average similarities between all observations in distinct clusters. Lets check that:

Step 1:
Average similarities

a-b: .3
a-c: .6
c-b: .9

So we merge a and b at .3

Step 2:
Average similarities

ab-c: (.6 + .9) / (2*1) = 1.5 / 2 = .75

So merging ab-c should be at .75. Well, either the calculation of mine is wrong or the dendrogram corresponds to complete linkage.

Upvotes: 1

plannapus

Reputation: 18759

The problem is that you never say at any point to your code that this is a similarity index. In fact you specifically say the opposite: as.dist(d). hclust takes a matrix of distance, i. e. dissimilarity. The simplest way to go for you is:

tb <- matrix(c(1,.3,.6,.3,1,.9,.6,.9,1),nrow=3)
tb <- 1-tb #Similarity to dissimilarity
d <- as.dist(tb)
plot(hclust(d))

Closeness (as you asked) was measured when you measured your Jaccard index.

Upvotes: 0

How to understand this dendogram

Answers (2)

Related Questions