kgyk1993
kgyk1993

Reputation: 31

Draw a heatmap with "super big" matrix

I want to draw a heatmap.
I have 100k*100k square matrix (50Gb(csv), numbers on right-top side and other filled by 0).

I want to ask "How can I draw a heatmap with R?" with this huge dataset.
I'm trying to this code on large RAM machine.

d = read.table("data.csv", sep=",")
d = as.matrix(d + t(d))
heatmap(d)

I tried some libraries like heatmap.2(in gplots) or something.
But they are take so much time and memories.

Upvotes: 3

Views: 3147

Answers (1)

digEmAll
digEmAll

Reputation: 57220

What I suggest you is to heavily down-sample your matrix before plotting the heatmap, e.g. doing the mean of each submatrices (as suggested by @IaroslavDomin) :

# example of big mx 10k x 10 k
bigMx <- matrix(rnorm(10000*10000,mean=0,sd=100),10000,10000)

# here we downsample the big matrix 10k x 10k to 100x100
# by averaging each submatrix
downSampledMx <- matrix(NA,100,100)
subMxSide <- nrow(bigMx)/nrow(downSampledMx)
for(i in 1:nrow(downSampledMx)){
  rowIdxs <- ((subMxSide*(i-1)):(subMxSide*i-1))+1
  for(j in 1:ncol(downSampledMx)){
    colIdxs <- ((subMxSide*(j-1)):(subMxSide*j-1))+1
    downSampledMx[i,j] <- mean(bigMx[rowIdxs,colIdxs])
  }
}

# NA to disable the dendrograms
heatmap(downSampledMx,Rowv=NA,Colv=NA) 

enter image description here

For sure with your huge matrix it will take a while to compute the downSampledMx, but it should be feasible.


EDIT :

I think downsampling should preserve recognizable "macro-patterns", e.g. see the following example :

# create a matrix with some recognizable pattern
set.seed(123)
bigMx <- matrix(rnorm(50*50,mean=0,sd=100),50,50)
diag(bigMx) <- max(bigMx) # set maximum value on the diagonal
# set maximum value on a circle centered on the middle
for(i in 1:nrow(bigMx)){
  for(j in 1:ncol(bigMx)){
    if(abs((i - 25)^2 + (j - 25)^2 - 10^2) <= 16)
      bigMx[i,j] <- max(bigMx)
  }
}

# plot the original heatmap
heatmap(bigMx,Rowv=NA,Colv=NA, main="original")


# function used to down sample
downSample <- function(m,newSize){
  downSampledMx <- matrix(NA,newSize,newSize)
  subMxSide <- nrow(m)/nrow(downSampledMx)
  for(i in 1:nrow(downSampledMx)){
    rowIdxs <- ((subMxSide*(i-1)):(subMxSide*i-1))+1
    for(j in 1:ncol(downSampledMx)){
      colIdxs <- ((subMxSide*(j-1)):(subMxSide*j-1))+1
      downSampledMx[i,j] <- mean(m[rowIdxs,colIdxs])
    }
  }
  return(downSampledMx)
}

# downsample x 2 and plot heatmap
downSampledMx <- downSample(bigMx,25)
heatmap(downSampledMx,Rowv=NA,Colv=NA, main="downsample x 2") 

# downsample x 5 and plot heatmap
downSampledMx <- downSample(bigMx,10)
heatmap(downSampledMx,Rowv=NA,Colv=NA, main="downsample x 5") 

Here's the 3 heatmaps :

enter image description here enter image description here enter image description here

Upvotes: 9

Related Questions