Reputation: 167
Hopefully title is not too badly worded. I have a tree that I used cutree to obtain groups from, but it is clear that the groups are not numbered left-to-right or right-to-left (I know the orientation within a branch doesn't matter so much, was hoping the grouping would be the same as the ordering in the hclust object). Is it possible to extract groups from a tree (using the height option of cutree) and know which of those groups are more related to one another? I walk through an example using USArrests below.
hc <- hclust(dist(USArrests), "ave")
plot(hc)
cutree(hc,h=60)
Alabama Alaska Arizona Arkansas California
1 1 1 2 1
Colorado Connecticut Delaware Florida Georgia
2 3 1 4 2
Hawaii Idaho Illinois Indiana Iowa
3 3 1 3 3
Kansas Kentucky Louisiana Maine Maryland
3 3 1 3 1
Massachusetts Michigan Minnesota Mississippi Missouri
2 1 3 1 2
Montana Nebraska Nevada New Hampshire New Jersey
3 3 1 3 2
New Mexico New York North Carolina North Dakota Ohio
1 1 4 3 3
Oklahoma Oregon Pennsylvania Rhode Island South Carolina
2 2 3 2 1
South Dakota Tennessee Texas Utah Vermont
3 2 2 3 3
Virginia Washington West Virginia Wisconsin Wyoming
2 2 3 3 2
If you plot the tree it is clear that groups 1 and 4 are more related then groups 2 and 3 are more related. However when I just print the contents of each group there is no way to know what that relationship is. Is there a function or standard process I am missing? The real data I'm working with I split 36k values into 10 groups, so it would be tough to visually validate the relationships as I do with the example data, and want to code it as a script for future analyses. Thanks ahead of time.
Upvotes: 2
Views: 748
Reputation: 206242
I think you want to use
hc <- hclust(dist(USArrests), "ave")
cuthc <- cut(as.dendrogram(hc), h=60)
This will return a list with an $upper
showing the tree above the cut, and a $lower
element which is a list of each of the subtrees made from the cut. We can plot them with
layout(matrix(1:4, ncol=2))
sapply(1:4, function(i) plot(cuthc$lower[[i]]))
Then, if you want to extract the names and groups in the order they appear in the dendrograms, you can do
stack(setNames(Map(labels, cuthc$lower),seq_along(cuthc$lower)))
Here I use stack()
and setNames()
just to assign a unique ID to each element in the $lower
list. stack()
doesn't like it when the list isn't named
Upvotes: 4