r_mvl
r_mvl

Reputation: 109

Convert GENCODE IDs to Ensembl - Ranged SummarizedExperiment

I have an expression set matrix with the rownames being what I think is a GENCODE ID in the format for example "ENSG00000000003.14" "ENSG00000000457.13" "ENSG00000000005.5" and so on. I would like to convert these to gene_symbol but I am not sure of the best way to do so, especially because of the ".14" or ".13" which I believe is the version. Should I first trim all IDs for what is after the dot and then use biomaRt to convert? if so, what is the most efficient way of doing it? Is there a better way to get to the gene_symbol?

Many thanks for you help

Upvotes: 5

Views: 2816

Answers (2)

r_mvl
r_mvl

Reputation: 109

Thanks for the help. My problem was to get rid of the version .XX at the end of each ensembl gene id. I thought there would be a more straight forward way of going from an ensembl gene id that has the version number (gencode basic annotation) to a gene symbol. In the end I did the following and seem to be working:

df$ensembl_gene_id <- gsub('\\..+$', '', df$ensembl_gene_id)

library(biomaRt)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- df$ensembl_gene_id
symbol <- getBM(filters = "ensembl_gene_id",
                attributes = c("ensembl_gene_id","hgnc_symbol"),
                values = genes, 
                mart = mart)
df <- merge(x = symbol, 
              y = df, 
              by.x="ensembl_gene_id",
              by.y="ensembl_gene_id")

Upvotes: 0

Manish Goel
Manish Goel

Reputation: 893

As already mentioned, these are ENSEMBL IDs. First thing, you would need to do is to check your expression set object and identify which database it uses for annotations. Sometimes, the IDs may map to different gene symbols in newer (updated) annotation databases.

Anyway, expecting that the IDs belong to Humans, you can use this code to get the gene symbols very easily.

library(org.Hs.eg.db)       ## Annotation DB
library(AnnotationDbi)

ids <- c("ENSG00000000003", "ENSG00000000457","ENSG00000000005")
gene_symbol <- select(org.Hs.eg.db,keys = ids,columns = "SYMBOL",keytype = "ENSEMBL")

You can try with org.Hs.eg.db or the exact db your expression set uses (if that information is available).

Upvotes: 2

Related Questions