Kevin
Kevin

Reputation: 2291

Strange error of Hierarchical Clustering in R

My R program is as below:

hcluster <- function(dmatrix) {
    imatrix <- NULL
    hc <- hclust(dist(dmatrix), method="average")
    for(h in sort(unique(hc$height))) {
        hc.index <- c(h,as.vector(cutree(hc,h=h)))
        imatrix <- cbind(imatrix, hc.index)
    }
    return(imatrix)
}

dmatrix_file = commandArgs(trailingOnly = TRUE)[1]
print(paste('Reading distance matrix from', dmatrix_file))
dmatrix <- as.matrix(read.csv(dmatrix_file,header=FALSE))

imatrix <- hcluster(dmatrix)
imatrix_file = paste("results",dmatrix_file,sep="-")
print(paste('Wrinting results to', imatrix_file))
write.table(imatrix, file=imatrix_file, sep=",", quote=FALSE, row.names=FALSE, col.names=FALSE)
print('done!')

My input is a distance matrix (of course symmetric). When I execute above program with a distance matrix larger than about thousands records(Nothing happen for several hundreds), it gave me the error message:

Error in cutree(hc, h = h) : 
  the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first
Calls: hcluster -> as.vector -> cutree
Execution halted

My machine has about 16GB of RAMs and 4CPU, so it won't be the problem of resources.

Can anyone please let me know what's the problem? Thanks!!

Upvotes: 6

Views: 3357

Answers (2)

Chris Hinshaw
Chris Hinshaw

Reputation: 7255

Looking at the cutree function here http://code.ohloh.net/file?fid=QM4q0tWQlv2VywAoSr2MfgcNjnA&cid=ki3UJjFJ8jA&s=cutree%20component%20of%20is%20not%20sorted&mp=1&ml=1&me=1&md=1&browser=Default#L1

You may try adding the k scaler for the number of groups, this will override the height argument. If not you may look at what hc$height is because if it is not a numeric, complex, character or logical vector, is.unsorted will return true and give you this error.

if(is.null(k)) {
    if(is.unsorted(tree$height))
        stop("the 'height' component of 'tree' is not sorted (increasingly)")
    ## h |--> k
    ## S+6 help(cutree) says k(h) = k(h+), but does k(h-) [continuity]
    ## h < min() should give k = n;
    k <- n+1L - apply(outer(c(tree$height,Inf), h, ">"), 2, which.max)
    if(getOption("verbose")) message("cutree(): k(h) = ", k, domain = NA)
}

Upvotes: 1

plof
plof

Reputation: 1304

I'm not much of an R wizard - but I ran into exactly this problem.

A potential answer is described here:

https://stat.ethz.ch/pipermail/r-help/2008-May/163409.html

Upvotes: 6

Related Questions