Zhubarb
Zhubarb

Reputation: 11885

R clustering- silhouette with observation labels

I do hierarchical clustering with the cluster package in R. Using the silhouette function, I can get the silhouette plot of my cluster output for any given height (h) cut-off in the dendrogram.

# run hierarchical clustering
if(!require("cluster")) { install.packages("cluster");  require("cluster") } 
tmp <- matrix(c( 0,  20,  20,  20,  40,  60,  60,  60, 100, 120, 120, 120,
                 20,   0,  30,  50,  60,  80,  40,  80, 120, 100, 140, 120,
                 20,  30,   0,  40,  60,  80,  80,  80, 120, 140, 140,  80,
                 20,  50,  40,   0,  60,  80,  80,  80, 120, 140, 140, 140,
                 40,  60,  60,  60,   0,  20,  20,  20,  60,  80,  80,  80,
                 60,  80,  80,  80,  20,   0,  20,  20,  40,  60,  60,  60,
                 60,  40,  80,  80,  20,  20,   0,  20,  60,  80,  80,  80,
                 60,  80,  80,  80,  20,  20,  20,   0,  60,  80,  80,  80,
                 100, 120, 120, 120,  60,  40,  60,  60,   0,  20,  20,  20,
                 120, 100, 140, 140,  80,  60,  80,  80,  20,   0,  20,  20,
                 120, 140, 140, 140,  80,  60,  80,  80,  20,  20,   0,  20,
                 120, 120,  80, 140,  80,  60,  80,  80,  20,  20,  20,   0),
                 nr=12, dimnames=list(LETTERS[1:12], LETTERS[1:12]))

cl <- hclust(as.dist(tmp,diag = TRUE, upper = TRUE), method= 'single')
sil_cl <- silhouette(cutree(cl, h=25) ,as.dist(tmp), title=title(main = 'Good'))
plot(sil_cl)

This gives the figure below, which is the point that frustrates me. How can I use the observation labels rownames(tmp) in the silhouette plot as opposed to the numeric indices (1 to 12) - which make no sense whatsoever to me.

enter image description here

Upvotes: 4

Views: 7686

Answers (2)

John
John

Reputation: 109

I found that adding the argument cex.names = par("cex.axis") to the plot() function gives you the desired labels:

cl <- hclust(as.dist(tmp,diag = TRUE, upper = TRUE), method= 'single')
sil_cl <- silhouette(cutree(cl, h=25) ,as.dist(tmp), title=title(main = 'Good'))


plot(sil_cl, cex.names = par("cex.axis"))

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

I'm not sure why but the silhouette call seems to drop the row names. You can add them back with

cl <- hclust(as.dist(tmp,diag = TRUE, upper = TRUE), method= 'single')
sil_cl <- silhouette(cutree(cl, h=25) ,as.dist(tmp), title=title(main = 'Good'))

rownames(sil_cl) <- rownames(tmp)

plot(sil_cl)

enter image description here

Upvotes: 4

Related Questions