Antonio
Antonio

Reputation: 1111

Hierarchical cluster analysis help - dendrogram

I made a code to generate a dendrogram as you can see in the image, using the hclust function. So, I would like help in the interpretation of this dendrogram. Note that the locations of these points are close. What does this dendrogram result I'm having mean, can you help me? I would really like a more complete analysis of the generated output.

library(geosphere)

Points_properties<-structure(list(Propertie=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29), Latitude = c(-24.781624, -24.775017, -24.769196, 
                                               -24.761741, -24.752019, -24.748008, -24.737312, -24.744718, -24.751996, 
                                               -24.724589, -24.8004, -24.796899, -24.795041, -24.780501, -24.763376, 
                                               -24.801715, -24.728005, -24.737845, -24.743485, -24.742601, -24.766422, 
                                               -24.767525, -24.775631, -24.792703, -24.790994, -24.787275, -24.795902, 
                                               -24.785587, -24.787558), Longitude = c(-49.937369, 
                                                                                                  -49.950576, -49.927608, -49.92762, -49.920608, -49.927707, -49.922095, 
                                                                                                  -49.915438, -49.910843, -49.899478, -49.901775, -49.89364, -49.925657, 
                                                                                                  -49.893193, -49.94081, -49.911967, -49.893358, -49.903904, -49.906435, 
                                                                                                  -49.927951, -49.939603, -49.941541, -49.94455, -49.929797, -49.92141, 
                                                                                                  -49.915141, -49.91042, -49.904772, -49.894034)), row.names = c(NA, -29L), class = c("tbl_df", "tbl", 
                                                                                                                                                                                                                        "data.frame"))

coordinates<-subset(Points_properties,select=c("Latitude","Longitude"))
plot(coordinates[,2:1])
text(x = Points_properties$Longitude,
y= Points_properties$Latitude, labels=Points_properties$Propertie, pos=2)

enter image description here

d<-distm(coordinates[,2:1])
d<-as.dist(d)
fit.average<-hclust(d,method="average")
plot(fit.average,hang=-1,cex=.8, main = "")

enter image description here

Upvotes: 3

Views: 262

Answers (1)

Waldi
Waldi

Reputation: 41260

You chose to perform hierarchical clustering using average method.

According to ?hclust:

This function performs a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered. Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster. At each stage distances between clusters are recomputed

You can follow what happens using the merge field:

Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation −j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm

fit.average$merge
      [,1] [,2]
 [1,]  -21  -22
 [2,]  -15    1
 [3,]  -13  -24
 [4,]   -6  -20
 [5,]   -2  -23
 [6,]  -16  -27
...

This is what you see in the dendogram:
enter image description here

The height on the y-axis of the dendogram represents the distance between a point and the center of the cluster it's associated to (because you use method average).

  1. points 21 and 22 (which are the nearest) are merged together creating cluster 1 with their barycenter
  2. cluster 1 is merged with point 15 creating cluster 2
  3. ...

You could then call rect.clust which allows various arguments, like the number of groups k you'd like:

rect.hclust(fit.average, k=3)

enter image description here

You can also use output of rect.clust to color the original points:

groups <- rect.hclust(fit.average, k=3)
groups

#[[1]]
# [1]  5  6  7  8  9 10 17 18 19 20

#[[2]]
# [1]  1  2  3  4 15 21 22 23

#[[3]]
#  [1] 11 12 13 14 16 24 25 26 27 28 29

colors <- rep(1:length(groups),lengths(groups))
colors <- colors[order(unlist(groups))]

plot(coordinates[,2:1],col = colors)

enter image description here

Upvotes: 6

Related Questions