sessmurda
sessmurda

Reputation: 167

R-How to obtain relationships between cutree groups?

Hopefully title is not too badly worded. I have a tree that I used cutree to obtain groups from, but it is clear that the groups are not numbered left-to-right or right-to-left (I know the orientation within a branch doesn't matter so much, was hoping the grouping would be the same as the ordering in the hclust object). Is it possible to extract groups from a tree (using the height option of cutree) and know which of those groups are more related to one another? I walk through an example using USArrests below.

hc <- hclust(dist(USArrests), "ave")
plot(hc)
cutree(hc,h=60)
       Alabama         Alaska        Arizona       Arkansas     California 
         1              1              1              2              1 
  Colorado    Connecticut       Delaware        Florida        Georgia 
         2              3              1              4              2 
    Hawaii          Idaho       Illinois        Indiana           Iowa 
         3              3              1              3              3 
    Kansas       Kentucky      Louisiana          Maine       Maryland 
         3              3              1              3              1 
Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
         2              1              3              1              2 
   Montana       Nebraska         Nevada  New Hampshire     New Jersey 
         3              3              1              3              2 
New Mexico       New York North Carolina   North Dakota           Ohio 
         1              1              4              3              3 
  Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
         2              2              3              2              1 
South Dakota      Tennessee          Texas           Utah        Vermont 
         3              2              2              3              3 
  Virginia     Washington  West Virginia      Wisconsin        Wyoming 
         2              2              3              3              2 

If you plot the tree it is clear that groups 1 and 4 are more related then groups 2 and 3 are more related. However when I just print the contents of each group there is no way to know what that relationship is. Is there a function or standard process I am missing? The real data I'm working with I split 36k values into 10 groups, so it would be tough to visually validate the relationships as I do with the example data, and want to code it as a script for future analyses. Thanks ahead of time.

Upvotes: 2

Views: 748

Answers (1)

MrFlick
MrFlick

Reputation: 206242

I think you want to use

hc <- hclust(dist(USArrests), "ave")
cuthc <- cut(as.dendrogram(hc), h=60)

This will return a list with an $upper showing the tree above the cut, and a $lower element which is a list of each of the subtrees made from the cut. We can plot them with

layout(matrix(1:4, ncol=2))
sapply(1:4, function(i) plot(cuthc$lower[[i]]))

enter image description here

Then, if you want to extract the names and groups in the order they appear in the dendrograms, you can do

stack(setNames(Map(labels, cuthc$lower),seq_along(cuthc$lower)))

Here I use stack() and setNames() just to assign a unique ID to each element in the $lower list. stack() doesn't like it when the list isn't named

Upvotes: 4

Related Questions