Reputation: 7
I have a txt file and it looks like this. I need to use biomaRT in R to get the corresponding gene IDs of a whole list of different Refseq and peptides. Along with that, I need to keep the peptide sequence with the final outcome How would I do that? Please help
myData = read.delim("phosphopeptides.txt", header = FALSE)
Upvotes: 1
Views: 706
Reputation: 56149
Using refseq_peptide to match our IDs:
library(biomaRt)
ensembl <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")
refseq_peptide = unique(myData$RefSeq)
res <- getBM(attributes = c("refseq_peptide", "hgnc_symbol"),
filters = "refseq_peptide",
values = refseq_peptide,
mart = ensembl)
res
# refseq_peptide hgnc_symbol
# 1 NP_000007 ACADM
# 2 NP_000009 ACADVL
# 3 NP_000012 PSEN1
#merge
merge(myData, res, by.x = "RefSeq", by.y = "refseq_peptide")
# RefSeq Peptide hgnc_symbol
# 1 NP_000007 R.SDPDPKAPANK.A ACADM
# 2 NP_000009 K.SDSHPSDALTR.K ACADVL
# 3 NP_000012 K.YNAESTERESQDTVAENDDGGFSEEWEAQR.D PSEN1
# 4 NP_000012 R.AAVQELSSSILAGEDPEER.G PSEN1
# 5 NP_000012 R.AAVQELSSSILAGEDPEER.G PSEN1
# 6 NP_000012 R.S*LGHPEPLSNGR.P PSEN1
Note: Useful function to find the attributes - searchAttributes, when we do not know the correct attribute name:
searchAttributes(mart = ensembl, pattern = "refseq")
# name description page
# 86 refseq_mrna RefSeq mRNA ID feature_page
# 87 refseq_mrna_predicted RefSeq mRNA predicted ID feature_page
# 88 refseq_ncrna RefSeq ncRNA ID feature_page
# 89 refseq_ncrna_predicted RefSeq ncRNA predicted ID feature_page
# 90 refseq_peptide RefSeq peptide ID feature_page
# 91 refseq_peptide_predicted RefSeq peptide predicted ID feature_page
searchAttributes(mart = ensembl, pattern = "hgnc")
# name description page
# 64 hgnc_id HGNC ID feature_page
# 65 hgnc_symbol HGNC symbol feature_page
# 95 hgnc_trans_name Transcript name ID feature_page
Upvotes: 2