Reputation: 43
Excuse Essay So I’ve done a Deseq analysis, then taken the counts file, applied the same names and then removed an NA values , then created a ?tibble/table called sigs, which I then turn into a Data frame:
sigs <- na.omit(res)
sigs
Looks something like this:
log2 fold change (MLE): condition groupb vs groupa
Wald test p-value: condition groupb vs groupa
DataFrame with 16003 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSSSCG00000048769 82.31674 -0.35837484 0.1217091 -2.9445195 0.00323457 0.0358965
ENSSSCG00000037372 40.49912 0.19133392 0.1472912 1.2990176 0.19393788 0.3612217
ENSSSCG00000027257 1572.05160 0.00319404 0.0743954 0.0429334 0.96575464 0.9791215
ENSSSCG00000029697 494.25472 -0.07424653 0.0665490 -1.1156672 0.26456461 0.4385568
ENSSSCG00000049216 2.54242 -0.42346331 0.5024718 -0.8427604 0.39936246 0.5728141
Then I turn it into a Data frame:
sigs.df <- as.data.frame(sigs)
Trying to show that here:
Description:df [16,003 × 6]
baseMean
<dbl>
log2FoldChange
<dbl>
lfcSE
<dbl>
stat
<dbl>
pvalue
<dbl>
ENSSSCG00000048769 8.231674e+01 -0.3583748397 0.12170911 -2.9445194769 3.234566e-03
ENSSSCG00000037372 4.049912e+01 0.1913339198 0.14729124 1.2990176317 1.939379e-01
ENSSSCG00000027257 1.572052e+03 0.0031940448 0.07439538 0.0429333738 9.657546e-01
ENSSSCG00000029697 4.942547e+02 -0.0742465345 0.06654900 -1.1156672146 2.645646e-01
Then I try and apply some parameters to thatt dataframe (Log2fold change and Padj)
sigs.df <- sigs.df[(abs(sigs.df$log2FoldChange)>1) & (sigs.df$padj < 0.05),]
sigs.df
Description:df [426 × 6]
baseMean
<dbl>
log2FoldChange
<dbl>
lfcSE
<dbl>
stat
<dbl>
pvalue
<dbl>
padj
<dbl>
18.859565 1.247705 0.4096202 3.046004 2.319046e-03 3.030462e-02
8.702231 -6.199963 1.5519239 -3.995017 6.468949e-05 4.932854e-03
9.466600 -1.535926 0.4899316 -3.134980 1.718657e-03 2.570514e-02
1099.496033 1.547162 0.3705798 4.174976 2.980168e-05 3.222408e-03
This has 426 rows in it! Then I perform normalisation, transformations, and plot a heatmap:
mat <- counts(dds, normalized = T)[rownames(sigs.df),]
mat
t(apply(mat,1, scale))
dds$condition <- factor(dds$condition, levels = c("Control","Blast"))
mat.z <- t(apply(mat,1, scale))
colnames(mat.z) = rownames(coldata)
mat.z
library(RColorBrewer)
bluegreen <- c("blue", "green")
pal <- colorRampPalette(bluegreen)(100)
par(cex.main=.8)
heatmap(mat.z,cluster_rows = T, cluster_columns = T, column_labels = colnames(mat.z), name = "z-score", col = pal, legend = TRUE,
main = "Heatmap of DEGS Normalized Counts in Pig Samples")
The Output Heattmat is below.
Qu1: It seems to be only displaying a seclection of the genes (Rows labelled on right). How can I get it to display all the genes in detail?
[For thoose wondering, I havent mapped the Ensembl ID’s as there is an issue with Biomart & obtaining the scrofus gene ID’s !]
Qu2: I would like to annotate this with the conditions that each samples (bottom of heatmap) were exposed to. The Sample conditions & runs (Run oone and run 2) are held in the file ‘coldata’ but I am unable to get the heatmap to label/ annotate in this way.
I have seen people call a data frame to do this i./e”
df <- as.data.frame(file$sampleconditions)
then call this with pheatmap (annotation_row = df)..
However I cant seem to get this to work - should I be labelling my sample ID’s with the condition in the same file?
Thanks. Apologies for haphazardness (edited)
:thread:
1
Rob Staruch
5:10 PM
Rplot_Normalised_Counts_Pig_LF2C>1abs, PPadj<0005.png
Rplot_Normalised_Counts_Pig_LF2C>1abs, PPadj<0005.png
:thread:
1
5:10
As an example of the above:
I want to add the annotation row labelling to a pheatmap.
It appears from the tutorial here: https://towardsdatascience.com/pheatmap-draws-pretty-heatmaps-483dab9a3cc
That I can call a data frame in order to do this.
Here is my data frame:
Sample Condition
1 Sample_Run1HR62_S1_Run1 groupa
2 Sample_Run2HR62_S1_Run2 groupa
3 Sample_Run1HR70_S2_Run1 groupa
4 Sample_Run2HR70_S2_Run2 groupa
5 Sample_Run1HR78_S3_Run1 groupa
6 Sample_Run2HR78_S3_Run2 groupa
7 Sample_Run1HR81_S4_Run1 groupa
8 Sample_Run2HR81_S4_Run2 groupa
9 Sample_Run1HR87_S5_Run1 groupa
10 Sample_Run2HR87_S5_Run2 groupa
11 Sample_Run1HR99_S6_Run1 groupa
12 Sample_Run2HR99_S6_Run2 groupa
13 Sample_Run1HR107_S7_Run1 groupa
14 Sample_Run2HR107_S7_Run2 groupa
15 Sample_Run1HR114_S8_Run1 groupa
16 Sample_Run2HR114_S8_Run2 groupa
17 Sample_Run1HR142_S17_Run1 groupa
18 Sample_Run2HR142_S17_Run2 groupa
19 Sample_Run1HR146_S18_Run1 groupa
20 Sample_Run2HR146_S18_Run2 groupa
21 Sample_Run1HR61_S9_Run1 groupb
22 Sample_Run2HR61_S9_Run2 groupb
23 Sample_Run1HR71_S11_Run1 groupb
24 Sample_Run2HR71_S11_Run2 groupb
25 Sample_Run1HR74_S41_Run1 groupb
26 Sample_Run2HR74_S41_Run2 groupb
27 Sample_Run1HR80_S12_Run1 groupb
28 Sample_Run2HR80_S12_Run2 groupb
29 Sample_Run1HR86_S13_Run1 groupb
30 Sample_Run2HR86_S13_Run2 groupb
31 Sample_Run1HR115_S14_Run1 groupb
32 Sample_Run2HR115_S14_Run2 groupb
33 Sample_Run1HR121_S15_Run1 groupb
34 Sample_Run2HR121_S15_Run2 groupb
35 Sample_Run1HR127_S16_Run1 groupb
36 Sample_Run2HR127_S16_Run2 groupb
37 Sample_Run2HR66_S10_Run2 groupb
38 Sample_Run1HR66_S10_Run1 groupb
Here is the r script I am using to generate the Pheatmap:
# Create sample-sample heatmap
sampleDists <- dist(t(assay(rld))) #calculates Euclidean distance. Rld to ensure we have a roughly equal contribution from all genes
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( targets$Sample, sep = " - " )
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists, clustering_distance_cols = sampleDists,col = colors, main = "Heatmap of Sample to Sample Distances in Pig Samples" )
Here is the same code when I add the ‘annotation_row’ command:
# Create sample-sample heatmap
sampleDists <- dist(t(assay(rld))) #calculates Euclidean distance. Rld to ensure we have a roughly equal contribution from all genes
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( targets$Sample, sep = " - " )
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists, clustering_distance_cols = sampleDists,col = colors,annotation_row = targets, main = "Heatmap of Sample to Sample Distances in Pig Samples" )
Here is the error generated from this:
Error in check.length("fill") :
'gpar' element 'fill' must not be length 0
Any help would be greatly appreciated
Upvotes: 1
Views: 4281
Reputation: 24252
In my opinion the error is due to a wrong format of the targets
object specified in annotation_row
.
Below I try to reproduce the error:
library(pheatmap)
library(RColorBrewer)
targets <- read.table(text="
Sample Group
1 Sample_Run1HR62_S1_Run1 groupa
2 Sample_Run2HR62_S1_Run2 groupa
3 Sample_Run1HR70_S2_Run1 groupa
4 Sample_Run2HR70_S2_Run2 groupa
5 Sample_Run1HR78_S3_Run1 groupa
6 Sample_Run2HR78_S3_Run2 groupa
7 Sample_Run1HR81_S4_Run1 groupa
8 Sample_Run2HR81_S4_Run2 groupa
9 Sample_Run1HR87_S5_Run1 groupa
10 Sample_Run2HR87_S5_Run2 groupa
11 Sample_Run1HR99_S6_Run1 groupa
12 Sample_Run2HR99_S6_Run2 groupa
13 Sample_Run1HR107_S7_Run1 groupa
14 Sample_Run2HR107_S7_Run2 groupa
15 Sample_Run1HR114_S8_Run1 groupa
16 Sample_Run2HR114_S8_Run2 groupa
17 Sample_Run1HR142_S17_Run1 groupa
18 Sample_Run2HR142_S17_Run2 groupa
19 Sample_Run1HR146_S18_Run1 groupa
20 Sample_Run2HR146_S18_Run2 groupa
21 Sample_Run1HR61_S9_Run1 groupb
22 Sample_Run2HR61_S9_Run2 groupb
23 Sample_Run1HR71_S11_Run1 groupb
24 Sample_Run2HR71_S11_Run2 groupb
25 Sample_Run1HR74_S41_Run1 groupb
26 Sample_Run2HR74_S41_Run2 groupb
27 Sample_Run1HR80_S12_Run1 groupb
28 Sample_Run2HR80_S12_Run2 groupb
29 Sample_Run1HR86_S13_Run1 groupb
30 Sample_Run2HR86_S13_Run2 groupb
31 Sample_Run1HR115_S14_Run1 groupb
32 Sample_Run2HR115_S14_Run2 groupb
33 Sample_Run1HR121_S15_Run1 groupb
34 Sample_Run2HR121_S15_Run2 groupb
35 Sample_Run1HR127_S16_Run1 groupb
36 Sample_Run2HR127_S16_Run2 groupb
37 Sample_Run2HR66_S10_Run2 groupb
38 Sample_Run1HR66_S10_Run1 groupb
", header=T)
# Generating a matrix for my example
rld <- matrix(rnorm(100*nr), ncol=nrow(targets))
sampleDists <- dist(t(rld))
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(targets$Sample)
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette(rev(brewer.pal(9, "Blues")))(255)
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists, col = colors,
annotation_row = targets,
main="Heatmap of Sample to Sample Distances in Pig Samples")
Here is the error:
Error in check.length("fill") : 'gpar' element 'fill' must not be length 0
To solve the problem, targets
needs to be reformatted.
First, the rownames of targets
must be the same of the sampleDistMatrix
matrix.
In addition, targets
must have only the Group
column.
rownames(targets) <- rownames(sampleDistMatrix)
targets <- targets[, -1, drop=F]
str(target)
# 'data.frame': 38 obs. of 1 variable:
# $ Group: chr "groupa" "groupa" "groupa" "groupa" ...
pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists, col = colors,
annotation_row = targets,
main="Heatmap of Sample to Sample Distances in Pig Samples")
Upvotes: 2