pdubois
pdubois

Reputation: 7800

How to cluster only the column in R heatmap.2?

I have the a data which I want to plot a heatmap with dendrogram clustering only for the column. How can I achieve that?

The data consist only one row but multiple columns. Note that I literally want the cluster on the column and not transposing it into row cluster.

This is the code I have, which didn't work.

library(gplots)
library(RColorBrewer)
dat.all <- structure(list(Probes = structure(1L, .Label = "1419598_at", class = "factor"), 
    XXX_LV_06.ip = 0.985, XXX_SP_06.ip = 0.932, XXX_LN_06.id = 2.115, 
    XXX_LV_06.id = 1.753, XXX_SP_06.id = 2.668, ZZZ_KD_06.ip = 10.079, 
    ZZZ_LG_06.ip = 2.323, ZZZ_LV_06.ip = 2.119, ZZZ_SP_06.ip = 4.157, 
    ZZZ_LN_06.id = 1.371, ZZZ_LV_06.id = 1.825, ZZZ_SP_06.id = 1.457, 
    ZZZ_KD_24.ip = 0L, ZZZ_LG_24.ip = 1.049, ZZZ_LV_24.ip = 1.372, 
    ZZZ_SP_24.ip = 1.83, AAA_LN_06.id = 1.991, AAA_LV_06.ip = 2.555, 
    AAA_SP_06.ip = 4.209, AAA_LV_06.id = 1.375, AAA_SP_06.id = 0.75, 
    GGG_LV_06.ip = 5.938, GGG_SP_06.ip = 8.326, GGG_LN_06.id = 1.982, 
    GGG_LV_06.id = 0.779, GGG_SP_06.id = 1.383, KKK_LN_06.id = 2.006, 
    KKK_LV_06.ip = 1.253, KKK_SP_06.ip = 1.774, X333_LV_06.id = 1.792, 
    X333_SP_06.id = 1.408, EEE_LV_06.in = 0.881, EEE_SP_06.in = 1.374, 
    DDD_LN_06.id = 2.052, DDD_LV_06.id = 1.363, DDD_SP_06.id = 1.678), .Names = c("Probes", 
"XXX_LV_06.ip", "XXX_SP_06.ip", "XXX_LN_06.id", "XXX_LV_06.id", 
"XXX_SP_06.id", "ZZZ_KD_06.ip", "ZZZ_LG_06.ip", "ZZZ_LV_06.ip", 
"ZZZ_SP_06.ip", "ZZZ_LN_06.id", "ZZZ_LV_06.id", "ZZZ_SP_06.id", 
"ZZZ_KD_24.ip", "ZZZ_LG_24.ip", "ZZZ_LV_24.ip", "ZZZ_SP_24.ip", 
"AAA_LN_06.id", "AAA_LV_06.ip", "AAA_SP_06.ip", "AAA_LV_06.id", 
"AAA_SP_06.id", "GGG_LV_06.ip", "GGG_SP_06.ip", "GGG_LN_06.id", 
"GGG_LV_06.id", "GGG_SP_06.id", "KKK_LN_06.id", "KKK_LV_06.ip", 
"KKK_SP_06.ip", "X333_LV_06.id", "X333_SP_06.id", "EEE_LV_06.in", 
"EEE_SP_06.in", "DDD_LN_06.id", "DDD_LV_06.id", "DDD_SP_06.id"
), row.names = 1L, class = "data.frame")



# Clustering and distance function
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")


height <- 3; 

outdir <- "./";

# Define output file name
heatout <-paste(outdir,base,"myplot.pdf",sep="");

# require(RColorBrewer)
col1 <- colorRampPalette(brewer.pal(12, "Set3"));
col2 <- colorRampPalette(brewer.pal(9, "Set1"));


cl.col <- hclustfunc(distfunc(t(dat.all)))


# extract cluster assignments; i.e. k=8 (rows) k=5 (columns)
gr.col <- cutree(cl.col, h=3)
gr.col.nofclust <- length(unique(as.vector(gr.col)));
clust.col.height <- col2(gr.col.nofclust);
hmcols <- rev(redgreen(2750));

pdf(file=heatout,width=50,height=25);
heatmap.2(as.matrix(dat.all),
                scale='row',
                trace='none',
                Rowv=FALSE,
                col=hmcols,
                symbreak=T,
                hclustfun=hclustfunc,
                distfun=distfunc,
                keysize=0.1,
                margins=c(10,200),
                lwid=c(1,4), lhei=c(0.7,3),
                ColSideColors=clust.col.height[gr.col])
dev.off();

The image will look like this: enter image description here

Upvotes: 8

Views: 4339

Answers (2)

user2357031
user2357031

Reputation:

Do you explicitly need to use the heatmap.2() function? If not, then I suggest you to consider function pheatmap() from the package pheatmap, since it allows you to accomplish the feat you're after with rather minimal gymnastics.

First of all, I would get rid of the first column in your dataset. However, to retain the information I'd put the Affymetrix ID as a row name in the data frame:

rownames(dat.all)<-dat.all[,1]
dat.all<-dat.all[,-1]

After that you could run the rest of your code up until the actual plotting of the heatmap. At that stage you resort to pheatmap(). It works very similarly to the heatmap.2(), but the names of the arguments are different. The following command should get you the rest of the way or close to it:

require(pheatmap)
pheatmap(dat.all, cluster_rows=FALSE, color=hmcols, scale="row",
annotation.colors=clust.col.height[gr.col], annotation=t(dat.all),
clustering_distance_cols=distfunc(t(dat.all)))

Arguments with annotation in their names add the column side colors. If you want to use your own distance function, you can specify its output as on input to pheatmap() with the argument clustering_distance_cols. Please consult the help for the pheatmap package for further details. Also, see an example plot below.

pheatmap plot

Upvotes: 6

Gavin Kelly
Gavin Kelly

Reputation: 2414

A bit of a hack to get around the 'each dimension must be two or more' constraint, you can rbind the single row to itself, along the lines of

heatmap.2(rbind(as.numeric(dat.all[,-1]),as.numeric(dat.all[,-1])),...

though you may need to adjust the labels manually. I took the first column off dat.all (with the [,-1] as the affymetrix id was getting in the way when I copied it across - you may not need to do that in the true version?

Upvotes: 1

Related Questions