Reputation: 13
I found one exactly the same question without any useful answer since author didn't provide their files. I am using DESeq2 library following the manual 3.2 Starting from count matrices. I have my countdata and coldata imported from CSV files. I understand that countdata file can be a problem here but I don't understand what's the problem exactly.
My code:
library(DESeq2)
NGS <- read.csv2(paste0(datadir,"/CLN3_NGS_orig.csv"), header = T,stringsAsFactors = F)
Sinfo <- read.csv2(paste0(datadir,"/Sampleinfo.csv"), header = T,stringsAsFactors = F)
head(NGS)
head(Sinfo)
coldata <- DataFrame(Sinfo)
coldata <- lapply(coldata, as.factor)
coldata
lapply(NGSnum, class)
NGSnum <- data.frame(NGS[1], apply(NGS[2:13],2, as.numeric))
NGSFull <- DESeqDataSetFromMatrix(
countData = NGSnum,
colData = coldata,
design = ~ Genotype + Treatment)
NGSFull
NGS$Genotype <- relevel(NGSFull$Genotype, "WT")
deseqNGS <- DESeq(NGS)
res <- results(deseqNGS)
res
My error after appyling DESeqDataSetFromMatrix:
Error in `rownames<-`(`*tmp*`, value = colnames(countData)) :
attempt to set 'rownames' on an object with no dimensions
My coldata and countdata files on pastebin: coldata & countdata
By the way, my countdata contain transcripts, sometimes several transcripts (ENST) correspond to single gene (ENSG). Can DESeq2 sort it out for me and give me only output with genes? It is easy to convert transcripts to genes but harder to make one position out of several.
Thank you in advance, Kasia
Upvotes: 1
Views: 4913
Reputation: 1422
As a general rule Bioconductor questions will get a lot more (relevant) attention on the Bioconductor support site link here. However, I can attempt to give a few pointers. The error you are getting is because your coldata is a list instead of a DataFrame object.
coldata <- lapply(coldata, as.factor)
creates a list for each column. There are also a few other issues that I've addressed in the code below. Most importantly NGSnum needs to be an integer matrix. Many RNAseq count matrices are actually floating points (or doubles in R) but that is due to the algorithm assigning probabilities for reads that could have come from multiple genes. What I've done is rounded the values to turn them into integers.
library(DESeq2)
NGS <- read.csv2("Countdata10.csv", header = TRUE, stringsAsFactors = FALSE)
Sinfo <- read.csv2(paste0("Sampleinfo.csv"), header = TRUE, stringsAsFactors = FALSE)
coldata <- DataFrame(apply(X = Sinfo, MARGIN = 2, FUN = as.factor)) # use apply instead of apply
NGSnum <- apply(X = NGS[,-1], MARGIN = 2, FUN = as.numeric)
NGSnum <- apply(X = NGSnum, MARGIN = 2, FUN = round)
rownames(NGSnum) <- NGS$Transcript
NGSFull <- DESeqDataSetFromMatrix(
countData = NGSnum,
colData = coldata,
design = ~ Genotype + Treatment)
NGSFull$Genotype <- relevel(NGSFull$Genotype, "WT")
deseqNGS <- DESeq(NGSFull)
res <- results(deseqNGS)
res
Upvotes: 2