alec_djinn
alec_djinn

Reputation: 10789

BLAST via Biopython NCBIWWW. Where can I find the complete database list?

I am using the module Biopython module NCBIWWW to blast some sequences online. I would like to blast my sequences against different databases available, however I cannot find a comprehensive list of them.

Here is an eample of simple query to the Nucleotide collection database using "blastn" algorithm.

from Bio.Blast import NCBIWWW

result_handle = NCBIWWW.qblast("blastn", "nt", some_sequence)

As you can see, the database Nucleotide collection is specified as "nt". With what shall I substitute "nt" in case I want to query the Human GRCh37/hg19 database for example? And if I want to query other species/builds? Is there any comprehensive list available where I can find the short names for all the databases available at http://blast.ncbi.nlm.nih.gov ?

Thanks!

Upvotes: 1

Views: 1504

Answers (3)

Freiburgermsu
Freiburgermsu

Reputation: 13

The drop-down options under "Database" in this interface appear to provide the options for each type of BLAST. I copied the names of these options in the BLAST sections of my Python module that uses the NCBIWWW function and accepted these options are parameters.

Upvotes: 0

Fábio Madeira
Fábio Madeira

Reputation: 136

By looking at biopython's documentation in the code at https://github.com/biopython/biopython/blob/master/Bio/Blast/NCBIWWW.py it seems it is querying this api http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

(...) This function does no checking of the validity of the parameters and passes the values to the server as is. More help is available at: http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

As you can see biopython enables you to query/parse all aspects of that api including the 'DATABASE' entry. Now the problem, which is actually your matter of your question is what is the short name for your db so that is recognized by the api. The documentation of the api isn't great so they don't have there any sort of list with valid db names (which is totally biopython agnostic).

I found these lists at ebi's which although not solving the problem seems to help

http://www.ebi.ac.uk/Tools/sss/ncbiblast/help/index-nucleotide.html http://www.ebi.ac.uk/Tools/sss/ncbiblast/help/index-protein.html

Another approach would be to see how they name their dbs in the public ftp ftp://ftp.ncbi.nlm.nih.gov/blast/db/

Hope this helps. Fábio

Upvotes: 2

Sharif Mamun
Sharif Mamun

Reputation: 3554

You can simply go to http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=tblastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome and click on the database drop down list and you will find the database names there like, nr, nt, est etc.

Try http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn&BLAST_PROG_DEF=megaBlast&BLAST_SPEC=OGP__9606__9558 for Human Genome.

Upvotes: 0

Related Questions