blue-sky
blue-sky

Reputation: 53826

What is the clustering algorithm used by hclust in R?

I've been using the hclust algorithm, here is the code :

hc = hclust(dist(mydata))
## tweeking some parameters for plotting a dendrogram
# set background color
op = par(bg="#DDE3CA")
# plot dendrogram
plot(hc, col="#487AA1", col.main="#45ADA8", col.lab="#7C8071",
     col.axis="#F38630", lwd=3, lty=3, sub='', hang=-1, axes=FALSE)
# add axis
axis(side=2, at=seq(0, 400, 100), col="#F38630",
     labels=FALSE, lwd=2)
# add text in margin
mtext(seq(0, 400, 100), side=2, at=seq(0, 400, 100),
      line=1, col="#A38630", las=2)
par(op)

What variation of clustering is hclust using as I want to implement it programmatically ? Is it same as implementation on wikipedia : http://en.wikipedia.org/wiki/Hierarchical_clustering ?

Upvotes: 2

Views: 2609

Answers (2)

Martin Mächler
Martin Mächler

Reputation: 4765

Note however that the code has been tweaked (i.e. improved!) in R several times; the algorithms in R are now both more versatile and, in one place, considerably more efficient than the original Statlib code mentioned above. Just do follow Joshua Ulrich's advice: After reading the help documentation, rather read R's source code, than the original in statlib. As R uses http based svn, you can see all R source code via your browser. This one is http://svn.r-project.org/R/trunk/src/library/stats/src/hclust.f

One further note: The agnes() in package cluster provides even more versatile agglomerative clustering methods; notably one whole class more in the next release of cluster. All of these are also svn repositing and available similarly, for agnes in http://svn.r-project.org/R-packages/trunk/cluster/src/twins.c (also translated from old Fortran, but now a bit more readable).

Upvotes: 4

TWL
TWL

Reputation: 2300

The hclust implementation is based on the Fortran code by Fionn Murtagh. It is deposited in the statlib: http://lib.stat.cmu.edu/S/multiv. All the methods are described in his manuscript "Multivariate Data Analysis with Fortan, C and Java Code", you can find it here. Also his resource website http://www.classification-society.org/csna/mda-sw/ is a good starting point. Hope this helps.

Upvotes: 4

Related Questions