Reputation: 155
Currently trying to create a heatmap of some genetic data. The columns are currently labeled s1, s2, s3, etc., but I also have a .txt file that has the correct corresponding labels for each sample. I'm not sure if I need to first modify the csv file with the levels of gene expression or if I can transfer them separately to the data frame I'm trying to prepare that will eventually be made into a heatmap. I'm also not sure exactly what the format of the dataframe should be. I would like to use ggplot2 to create the heatmap if that matters.
Here's my code so far:
library(ggplot2)
library(dplyr)
library(magrittr)
nci <- read.csv('/Users/myname/Desktop/ML Extra Credit/nci.data.csv')
nci.label <-scan(url("https://web.stanford.edu/~hastie/ElemStatLearn/datasets/nci.label",what="")
#Select certain columns (specific years)
mat <- matrix(rexp(200, rate=.1), ncol=20)
rownames(mat) <- paste0('gene',1:nrow(mat))
colnames(mat) <- paste0('sample',1:ncol(mat))
mat[1:5,1:5]
It outputs a sample data frame that looks like this:
sample1 sample2 sample3 sample4 sample5
gene1 32.278434 16.678512 0.4637713 1.016569 3.353944
gene2 8.719729 11.080337 1.5254223 2.392519 3.503191
gene3 2.199697 18.846487 13.6525699 34.963664 2.511097
gene4 5.860673 2.160185 3.5243884 6.785453 3.947606
gene5 16.363688 38.543575 5.6761373 10.142018 22.481752
Any help would be greatly appreciated!!
Upvotes: 0
Views: 2948
Reputation: 13793
You will want to get your dataframe in "long" format to facilitate plotting. This is what's called Tidy Data and forms the basis for preparing data to be plotted using ggplot2
.
The general idea here is that you need one column for the x
value, one column for the y
value, and one column to represent the value used for the tile color. There are lots of ways to do this (see melt()
, pivot_longer()
...), but I like to use tidyr::gather()
. Since you're using rownames, instead of a column for gene, I'm first creating that as a column in your dataset.
library(dplyr)
library(tidyr)
library(ggplot2)
set.seed(1234)
# create matrix
mat <- matrix(rexp(200, rate=.1), ncol=20)
rownames(mat) <- paste0('gene',1:nrow(mat))
colnames(mat) <- paste0('sample',1:ncol(mat))
mat[1:5,1:5]
# convert to data.frame and gather
mat <- as.data.frame(mat)
mat$gene <- rownames(mat)
mat <- mat %>% gather(key='sample', value='value', -gene)
The ggplot
call is pretty easy. We assign each column to x
, y
, and fill
aesthetics, then use geom_tile()
to create the actual heatmap.
ggplot(mat, aes(sample, gene)) + geom_tile(aes(fill=value))
Upvotes: 1