AnjaM
AnjaM

Reputation: 3039

Clustering and heatmap in R

I am a newbie to R and I am trying to do some clustering on a data table where rows represent individual objects and columns represent the features that have been measured for these objects. I've worked through some clustering tutorials and I do get some output, however, the heatmap that I get after clustering does not correspond at all to the heatmap produced from the same data table with another programme. While the heatmap of that programme does indicate clear differences in marker expression between the objects, my heatmap doesn't show much differences and I cannot recognize any clustering (i.e., colour) pattern on the heatmap, it just seems to be a randomly jumbled set of colours that are close to each other (no big contrast). Here is an example of the code I am using, maybe someone has an idea on what I might be doing wrong.

mydata <- read.table("mydata.csv")
datamat <- as.matrix(mydata)
datalog <- log(datamat)

I am using log values for the clustering because I know that the other programme does so, too

library(gplots)

hr <- hclust(as.dist(1-cor(t(datalog), method="pearson")), method="complete")
mycl <- cutree(hr, k=7)
mycol <- sample(rainbow(256)); mycol <- mycol[as.vector(mycl)]
heatmap(datamat, Rowv=as.dendrogram(hr), Colv=NA,
    col=colorpanel(40, "black","yellow","green"),
    scale="column", RowSideColors=mycol) 

Again, I plot the original colours but use the log-clusters because I know that this is what the other programme does.

I tried to play around with the methods, but I don't get anything that would at least somehow look like a clustered heatmap. When I take out the scaling, the heatmap becomes extremely dark (and I am actually quite sure that I have somehow to scale or normalize the data by column). I also tried to cluster with k-means, but again, this didn't help. My idea was that the colour scale might not be used completely because of two outliers, but although removing them slightly increased the range of colours plotted on the heatmap, this still did not reveal proper clusters.

Is there anything else I could play around with?

And is it possible to change the colour scale with heatmap so that outliers are found in the last bin that has a range of "everything greater than a particular value"? I tried to do this with heatmap.2 (argument "breaks"), but I didn't quite succeed and also I didn't manage to put the row side colours that I use with the heatmap function.

Upvotes: 4

Views: 11807

Answers (1)

Alos
Alos

Reputation: 2715

If you are okay with using heatmap.2 from the gplots package that will allow you to add breaks to assign colors to ranges represented in your heatmap.
For example if you had 3 colors blue, white, and red with the values going from low to high you could do something like this:

my.breaks <- c(seq(-5, -.6, length.out=6),seq(-.5999999, .1, length.out=4),seq(.100009,5, length.out=7))
result <- heatmap.2(mtscaled, Rowv=T, scale='none', dendrogram="row", symm = T, col=bluered(16), breaks=my.breaks)

In this case you have 3 sets of values that correspond to the 3 colors, the values will differ of course depending on what values you have with your data.

One thing you are doing in your program is to call hclust on your data then to call heatmap on it, however if you look in the heatmap manual page it states: Defaults to hclust. So I don't think you need to do that. You might want to take a look at some similar questions that I had asked that might help to point you in the right direction:

Heatmap Question 1

Heatmap Question 2

If you post an image of the heatmap you get and an image of the heatmap that the other program is making it will be easier for us to help you out more.

Upvotes: 1

Related Questions