csgroen
csgroen

Reputation: 2551

Merging multiple hclust objects (or dendrograms)

Is there an easy way to merge multiple hclust objects (or dendrograms) at the root?

I've made the example as complete as possible to illustrate my problem.

Let's say I'd like to cluster USArrests by region and then unite all the hclust objects to plot them together in a heatmap.

USArrests
Northeast <- c("Connecticut", "Maine", "Massachusetts", "New Hampshire", "Rhode Island", 
"Vermont", "New Jersey", "New York", "Pennsylvania")
Midwest <-  c("Illinois", "Indiana", "Michigan", "Ohio",  "Wisconsin", 
    "Iowa", "Kansas", "Minnesota", "Missouri", "Nebraska", "North Dakota", 
    "South Dakota")
South <- c("Delaware", "Florida", "Georgia", "Maryland", "North Carolina", 
           "South Carolina", "Virginia", "West Virginia", 
           "Alabama", "Kentucky", "Mississippi", "Tennessee", "Arkansas", 
           "Louisiana", "Oklahoma", "Texas")
West <- c("Arizona", "Colorado", "Idaho", "Montana", "Nevada", "New Mexico", 
          "Utah", "Wyoming", "Alaska", "California", "Hawaii", "Oregon", "Washington")

h1 <- hclust(dist(USArrests[Northeast,]))
h2 <- hclust(dist(USArrests[Midwest,]))
h3 <- hclust(dist(USArrests[South,]))
h4 <- hclust(dist(USArrests[West,]))

Now I have 4 hclust objects (h1 through h4). I usually merge them like so:

hc <- as.hclust(merge(merge(merge(
    as.dendrogram(h1), as.dendrogram(h2)), as.dendrogram(h3)), 
    as.dendrogram(h4)))

Then, to plot them, I have to reorder the matrix according to the hclust objects, then plot (I've added some annotation to make the plot clearer):

usarr <- USArrests[c(Northeast, Midwest, South, West),]

region_annotation <- data.frame(Region = c(rep("Northeast", length(Northeast)), 
                                rep("Midwest", length(Midwest)),
                                rep("South", length(South)),
                                rep("West", length(West))),
                                row.names = c(Northeast, Midwest, South, West))

pheatmap(usarr, cluster_rows = hc, 
         annotation_row = region_annotation)

Heatmap results, with some additional graphical parameters for beauty's sake

In summary: is there an easier way to do this than merging all the separate hclusts?

Upvotes: 5

Views: 1777

Answers (2)

csgroen
csgroen

Reputation: 2551

I ended up making a couple of functions to do this more automatically. (In my version, I've also added support for correlation "distance", so it's a little bigger)

hclust_semisupervised <- function(data, groups, dist_method = "euclidean",
                                  dist_p = 2, hclust_method = "complete") {
    hclist <- lapply(groups, function (group) {
        hclust(dist(data[group,], method = dist_method, p = dist_p), method = hclust_method)
    })
    hc <- .merge_hclust(hclist)
    data_reordered <- data[unlist(groups),]

    return(list(data = data_reordered, hclust = hc))
}

.merge_hclust <- function(hclist) {
    #-- Merge
    d <- as.dendrogram(hclist[[1]])
    for (i in 2:length(hclist)) {
        d <- merge(d, as.dendrogram(hclist[[i]]))
    }
    as.hclust(d)
}

Having USArrests and the region vectors, I call hclust_semisupervised like so:

semi_hc <- hclust_semisupervised(USArrests, list(Northeast, Midwest, South, West)

Now plotting the heatmap:

pheatmap(semi_hc$data, cluster_rows = semi_hc$hclust, 
         annotation_row = region_annotation)

Upvotes: 3

Sathish
Sathish

Reputation: 12723

To create a merged hclust object, you can use <<- safely inside a custom environment created using new.env.

There may be other ways to create a merged object two at a time without using <<-. Hopefully, someone can throw a light on it.

I tried to use do.call('merge', list( dendrograms of h1, h2, h3, h4 ). But it did not work, because hclust requires two branches at the top instead of 4 branches.

Code:

library('pheatmap')
myenv <- new.env()
myenv$hc <- as.dendrogram( hclust( dist(USArrests[Northeast,])))
invisible( lapply( list( Midwest, South, West), function(x){
  myenv$hc <<- merge( myenv$hc, as.dendrogram( hclust( dist( USArrests[ x, ]) )) )
  NULL
} ) )
myenv$hc <- as.hclust(myenv$hc)

Graph:

pheatmap(usarr, cluster_rows = myenv$hc, 
         annotation_row = region_annotation)

enter image description here

Upvotes: 2

Related Questions