Reputation: 57
I am creating an hclust
object manually (i.e. creating a list with the required slots, then changing its class to hclust
). The merging pattern, heights of bifurcations, ordering of leaf nodes and labels of leaf nodes are known. My goal (and means of testing) is to plot the resulting dendrogram. I am unable to create a plottable hclust
object with my parameters.
The components of an hclust
object are described in the hclust
function documentation
here (see section Value).
The following is a reproducible chunk of R
code I use to generate and plot my dendrogram.
tree <- list()
tree$merge <- matrix(c( -1, -7, # row 1
-2, -6, # row 2
-3, -12, # row 3
-4, -14, # row 4
-5, -8, # row 5
-9, -11, # row 6
-13, -20, # row 7
-15, -19, # row 8
1, 8, # row 9
2, 5, # row 10
3, 6, # row 11
2, -18, # row 12
1, 3, # row 13
2, 4, # row 14
-10, 7, # row 15
-16, -17, # row 16
1, 2, # row 17
15, 16, # row 18
1, 15), # row 19
ncol = 2,
byrow = TRUE)
tree$height <- c(0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.11167131, 0.11167131, 0.11167131, 0.12832304, 0.17304035, 0.17304035, 0.17304035, 0.17304035, 0.22965349, 0.22965349, 0.23334799)
tree$labels <- as.character(1:20)
tree$order <- c(1, 7, 15, 19, 3, 12, 9, 11, 2, 6, 5, 8, 18, 4, 14, 13, 20, 10, 16, 17)
class(tree) <- "hclust"
plot(tree)
Each row of the tree$merge
matrix corresponds to a bifurcation. Negative integers refer to the indices of a leaf nodes, whereas positive integers refer to existing clusters by row indices in tree$merge
.
Running the code results in the following error message.
Error in plot.hclust(tree) : 'merge' matrix has invalid contents
A sketch of the intended result is pictured below, with the heights
values marked by additional dotted lines. (The drawing is not to scale.)
Upvotes: 1
Views: 1074
Reputation: 4233
The validity of a hclust
tree is checked by the .validity.hclust
function. Its source code is given here. Look at lines 121-135.
That you got the error means that your tree is not valid because of its merge
matrix. It has non-unique elements (e.g., 1 and 2). In a properly constructed merge
matrix, all entries are unique and run from -N_obs
to N_obs-2
(zero excluded), where N_obs
is a (positive) number of observations. This is checked by the following if
test in the code:
if(identical(sort(as.integer(merge)), c(-(n:1L), +seq_len(n-2L))))
TRUE
else
"'merge' matrix has invalid contents"
From the reference of hclust
:
merge an n − 1 by 2 matrix.
Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation − j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.
All negative entries are singletons (observations), and positive numbers are merges of existing clusters and refer to merging steps of the algorithm.
So, revise your hclust
object. Here is some code to give you an idea what a proper hclust
object looks like:
iris2 <- iris[1:20,-5]
species_labels <- iris[,5]
d_iris <- dist(iris2)
tree_iris <- hclust(d_iris, method = "complete")
Take a closer look at tree_iris$merge
.
UPDATE
After I got more time, I decided to fix your code. I modified the merge
entry of the tree
. This is what the working code that reproduces your dendrogram looks like:
tree <- list()
tree$merge <- matrix(c( -1, -7, # row 1
-2, -6, # row 2
-3, -12, # row 3
-4, -14, # row 4
-5, -8, # row 5
-9, -11, # row 6
-13, -20, # row 7
-15, -19, # row 8
1, 8, # row 9: 1,7,15,19
2, 5, # row 10: 2,6,5,8
3, 6, # row 11: 3,12,9,11
10, -18, # row 12: 2,6,5,8 + 18
9, 11, # row 13: 1,7,15,19 + 3,12,9,11
12, 4, # row 14: row 12 + row 4
-10, 7, # row 15: row 7 + 10
-16, -17, # row 16
13, 14, # row 17: row 13 + row 14
15, 16, # row 18: row 15 + row 16
17, 18), # row 19: row 17 + row 18
ncol = 2,
byrow = TRUE)
tree$height <- c(0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.11167131, 0.11167131, 0.11167131, 0.12832304, 0.17304035, 0.17304035, 0.17304035, 0.17304035, 0.22965349, 0.22965349, 0.23334799)
tree$labels <- as.character(1:20)
tree$order <- c(1, 7, 15, 19, 3, 12, 9, 11, 2, 6, 5, 8, 18, 4, 14, 13, 20, 10, 16, 17)
class(tree) <- "hclust"
plot(tree)
Upvotes: 2