Patricia
Patricia

Reputation: 11

RNA-seq - R - how to Import Gene length into a dateset for RPKM calc

so, in principle its very simple to normalize a raw count RNAseq file...

BUT My raw counts' file does not accompany the gene length.

How/from where can I import the gene length and match it to the ensemble ID? I'm using EdgeR rpkm from cpm values, it returns

x_rpkm <- rpkm(x_cpm)

"Error in rpkm.default(x_cpm) : argument "gene.length" is missing, with no default"

Thank you

Upvotes: 1

Views: 980

Answers (1)

rpolicastro
rpolicastro

Reputation: 1305

You should have used a '.gtf' or '.gff' file when counting your reads per gene. First load that file into R using the GenomicFeatures library.

library("GenomicFeatures")
gtf_txdb <- makeTxDbFromGFF("example.gtf")

Then get the list of genes within the imported gtf as a GRanges object using the genes function, again from the GenomicFeatures library.

gene_list <- genes(gtf_txdb)

If you then convert the gene list to a data.frame, you'll get a bunch of info for each gene, including the lengths.

gene_list <- as.data.frame(gene_list)

Upvotes: 1

Related Questions