Reputation: 2291
My R program is as below:
hcluster <- function(dmatrix) {
imatrix <- NULL
hc <- hclust(dist(dmatrix), method="average")
for(h in sort(unique(hc$height))) {
hc.index <- c(h,as.vector(cutree(hc,h=h)))
imatrix <- cbind(imatrix, hc.index)
}
return(imatrix)
}
dmatrix_file = commandArgs(trailingOnly = TRUE)[1]
print(paste('Reading distance matrix from', dmatrix_file))
dmatrix <- as.matrix(read.csv(dmatrix_file,header=FALSE))
imatrix <- hcluster(dmatrix)
imatrix_file = paste("results",dmatrix_file,sep="-")
print(paste('Wrinting results to', imatrix_file))
write.table(imatrix, file=imatrix_file, sep=",", quote=FALSE, row.names=FALSE, col.names=FALSE)
print('done!')
My input is a distance matrix (of course symmetric). When I execute above program with a distance matrix larger than about thousands records(Nothing happen for several hundreds), it gave me the error message:
Error in cutree(hc, h = h) :
the 'height' component of 'tree' is not sorted
(increasingly); consider applying as.hclust() first
Calls: hcluster -> as.vector -> cutree
Execution halted
My machine has about 16GB of RAMs and 4CPU, so it won't be the problem of resources.
Can anyone please let me know what's the problem? Thanks!!
Upvotes: 6
Views: 3357
Reputation: 7255
Looking at the cutree function here http://code.ohloh.net/file?fid=QM4q0tWQlv2VywAoSr2MfgcNjnA&cid=ki3UJjFJ8jA&s=cutree%20component%20of%20is%20not%20sorted&mp=1&ml=1&me=1&md=1&browser=Default#L1
You may try adding the k scaler for the number of groups, this will override the height argument. If not you may look at what hc$height is because if it is not a numeric, complex, character or logical vector, is.unsorted will return true and give you this error.
if(is.null(k)) {
if(is.unsorted(tree$height))
stop("the 'height' component of 'tree' is not sorted (increasingly)")
## h |--> k
## S+6 help(cutree) says k(h) = k(h+), but does k(h-) [continuity]
## h < min() should give k = n;
k <- n+1L - apply(outer(c(tree$height,Inf), h, ">"), 2, which.max)
if(getOption("verbose")) message("cutree(): k(h) = ", k, domain = NA)
}
Upvotes: 1
Reputation: 1304
I'm not much of an R wizard - but I ran into exactly this problem.
A potential answer is described here:
https://stat.ethz.ch/pipermail/r-help/2008-May/163409.html
Upvotes: 6