Reputation: 11
so, in principle its very simple to normalize a raw count RNAseq file...
BUT My raw counts' file does not accompany the gene length.
How/from where can I import the gene length and match it to the ensemble ID? I'm using EdgeR rpkm from cpm values, it returns
x_rpkm <- rpkm(x_cpm)
"Error in rpkm.default(x_cpm) : argument "gene.length" is missing, with no default"
Thank you
Upvotes: 1
Views: 980
Reputation: 1305
You should have used a '.gtf' or '.gff' file when counting your reads per gene. First load that file into R using the GenomicFeatures library.
library("GenomicFeatures")
gtf_txdb <- makeTxDbFromGFF("example.gtf")
Then get the list of genes within the imported gtf as a GRanges object using the genes function, again from the GenomicFeatures library.
gene_list <- genes(gtf_txdb)
If you then convert the gene list to a data.frame, you'll get a bunch of info for each gene, including the lengths.
gene_list <- as.data.frame(gene_list)
Upvotes: 1