Reputation: 1145
I have several SNP IDs (i.e., rs16828074, rs17232800, etc...), I want to their coordinates in a Hg19 genome from UCSC genome website.
I would prefer using R
to accomplish this goal. How to do that?
Upvotes: 11
Views: 9800
Reputation: 2083
Via Perl you will find it quite easy to build code to query for SNPs.
There is a web browser GUI tool (HERE) for building perl scripts based on which database and dataset you wish to query using Biomart library.
Instructions
Select the database and dataset:
Click on the "perl" button to generate perl code for the Biomart API querying, and copy-paste the code into your perl editor - run it with the SNP rsNumbers of your choice.
# An example script demonstrating the use of BioMart API. use strict; use BioMart::Initializer; use BioMart::Query; use BioMart::QueryRunner; my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/." my $action='cached'; my $initializer = BioMart::Initializer->new('registryFile'=>$confFile,'action'=>$action); my $registry = $initializer->getRegistry; my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default'); $query->setDataset("hsapiens_snp"); $query->addAttribute("refsnp_id"); $query->addAttribute("refsnp_source"); $query->addAttribute("chr_name"); $query->addAttribute("chrom_start"); $query->formatter("TSV"); my $query_runner = BioMart::QueryRunner->new(); ############################## GET RESULTS ########################## $query_runner->execute($query); $query_runner->printHeader(); $query_runner->printResults(); $query_runner->printFooter(); #####################################################################
Upvotes: 4
Reputation: 2083
Using bioconductor's biomaRt R package.
This provides an easy way to send queries to BioMart which fetches information about SNPs given an rsNumber (i.e. rsid).
E.g. to import SNP data for rs16828074 (an rsNumber you listed in the post), use this:
Code:
library(biomaRt)
snp.id <- 'rs16828074' # an SNP rsNumber like you listed in the post
snp.db <- useMart("snp", dataset="hsapiens_snp") # select your SNP database
# The SNP data file imported from the HUMAN database:
nt.biomart <- getBM(c("refsnp_id","allele","chr_name","chrom_start",
"chrom_strand","associated_gene",
"ensembl_gene_stable_id"),
filters="refsnp",
values=snp.id,
mart=snp.db)
Let me know how you get on with this (via comments) since I assume some basic coding and package importing ability in my answer here.
Aknowledgement/s:
goes to Jorge Amigo (for his post in Biostars)
Upvotes: 3
Reputation: 14667
Here is a solution using the Bioconductor package biomaRt
. It is a slightly corrected and reformatted version of the previously posted code.
library(biomaRt) # biomaRt_2.30.0
snp_mart = useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
snp_ids = c("rs16828074", "rs17232800")
snp_attributes = c("refsnp_id", "chr_name", "chrom_start")
snp_locations = getBM(attributes=snp_attributes, filters="snp_filter",
values=snp_ids, mart=snp_mart)
snp_locations
# refsnp_id chr_name chrom_start
# 1 rs16828074 2 232318754
# 2 rs17232800 18 66292259
Users are encouraged to read the comprehensive biomaRt
vignette and experiment with the following biomaRt
functions:
listFilters(snp_mart)
listAttributes(snp_mart)
attributePages(snp_mart)
listDatasets(snp_mart)
listMarts()
Upvotes: 13