DianaHelen
DianaHelen

Reputation: 111

Randomizations and hierarchical tree

I am trying to permute (column-wise only) my data matrix a 1000 times and then do hierarchical clustering in "R" so I have the final tree on my data after 1000 randomizations. This is where I am lost. I have this loop

    for(i in 1:1000) 
    { 
    permuted <- test2_matrix[,sample(ncol(test2_matrix), 12, replace=TRUE)]; (this permutes my columns)
    d = dist(permuted, method = "euclidean", diag = FALSE, upper = FALSE, p = 2);
    clust = hclust(d, method = "complete", members=NULL);
    } 
    png (filename="cluster_dendrogram_bootstrap.png", width=1024, height=1024, pointsize=10) 
    plot(clust)

I am not sure if the final tree is a product after the 1000 randomizations or just the last tree that it calculated in the loop. Also If I want to display the bootstrap values on the tree how should I go about it?

Many thanks!!

Upvotes: 1

Views: 256

Answers (2)

Michael Dunn
Michael Dunn

Reputation: 8313

The value of clust in your example is indeed the final tree calculated in the loop. Here's a way of making and saving 1000 permutations of your matrix

make.permuted.clust <- function(i){ # this argument is not used
  permuted <- data.matrix[,sample(ncol(data.matrix), 12, replace=TRUE)]
  d <- dist(permuted, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
  clust <- hclust(d, method = "complete", members=NULL)
  clust # return value
}

all.clust <- lapply(1:1000, make.permuted.clust) # 1000 hclust trees

The second part of your question should be answered here.

Upvotes: 1

Etienne Low-D&#233;carie
Etienne Low-D&#233;carie

Reputation: 13443

You may be interested in the RandomForest method implemented in the randomForest package, which implements both bootstrapping of the data and of the splitting variables and allows you to save trees and get a consensus tree.

library(randomForest)

The original random forest (in FORTRAN 77) developers site

The package manual

Upvotes: 0

Related Questions