Reputation: 933
I am working with bacterial sequences from NCBI Nucleotide database. If I have an accession e.g. NC_002663 and I need the annotations in GFF, how would I easily do that using Entrez (preferably Biopython)?
If I go to the NCBI entry, I see the link to the assembly. Is there an easy way to programmatically access it? Esummary service doesn't return such links:
handle = Entrez.esummary(db='nucleotide', id='NC_002663')
record = Entrez.read(handle)
[DictElement({'Item': [], 'Id': '15601865', 'Caption': 'NC_002663', 'Title': 'Pasteurella multocida subsp. multocida str. Pm70, complete genome', 'Extra': 'gi|15601865|ref|NC_002663.1|[15601865]', 'Gi': IntegerElement(15601865, attributes={}), 'CreateDate': '2001/09/10', 'UpdateDate': '2018/01/11', 'Flags': IntegerElement(800, attributes={}), 'TaxId': IntegerElement(272843, attributes={}), 'Length': IntegerElement(2257487, attributes={}), 'Status': 'live', 'ReplacedBy': '', 'Comment': ' ', 'AccessionVersion': 'NC_002663.1'}, attributes={})]
I could maybe search the Assembly db with the "Title", but it seems there could be a better way (without as many API calls). Thanks!
Upvotes: 0
Views: 279
Reputation: 1328
I am not sure whether NCBI Nucleotide allows GFF download programmatically (via `efetch´ function) yet. You can access fasta or genbank files that way, but GFFs were not listed.
You can
Entrez.efetch
function, and convert it to GFFwget
or other).Also, there is a biomart
package. Its R implementation mention function getGFF
which can query several databases (though not the Nucleotide database). You could check if its python implementation has the same functionality available and if you could find the same files from there.
Upvotes: 1