Reputation: 127
I stumbled upon a Genbank-formatted file (shown here as a minimal dummy example), which contains a nested feature like this:
FEATURES Location/Qualifiers
xxxx_domain complement(complement(1..145))
Such a feature crashes the current Biopython Genbank parser (1.59 release), but it apparently did not in former releases (e.g. 1.55). Apparently the behaviour was already in 1.57 (see comment below).
From the Biopython bugtracker, it seems that the old locationparser code got removed in 1.56:
From what I could deduce from the format description on ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt and http://www.insdc.org/documents/feature_table.html#3.4.2 this is most likely invalid. (but see comment below).
Can someone comment on this. I.e. is this a glitch in Biopython or in the format of the Genbank file?
A full demo file:
LOCUS XXXXXXXXXXXXXX 240 bp DNA circular 17-JAN-2012
DEFINITION xxxxxx.
KEYWORDS xx.
SOURCE
ORGANISM
FEATURES Location/Qualifiers
xxxx_domain complement(complement(1..145))
/vntifkey="1"
/label=A label
/note="A note"
BASE COUNT 75 a 57 c 42 g 66 t
ORIGIN
1 tttacaaaac gcattttcaa accttgggta ctaccccctt ttaaatatcc gaatacacta
61 ataaacgctc tttcctttta ggtaaacccg ccaatatata ctgatacaca ctgatagttt
121 aaactagatg cagtggccga ccatcagatc tagtaggaaa cagctatgac catgattacg
181 cattacttat ttaagatcaa ccgtaccagt ataccctgcc agcatgatgg aaacctccct
//
A minimum demo program to show the error (assumes Biopython 1.59 and Python 2.7 are installed and the above mentioned file is available as "test.gb":
#!/usr/bin/env python
from Bio import SeqIO
s = SeqIO.read(open("test.gb")), "r"), "genbank")
This crashes with
raise LocationParserError(location_line)
Bio.GenBank.LocationParserError: complement(1..145)
Upvotes: 2
Views: 334
Reputation: 1614
I believe that is an invalid location. Was this from an NCBI file, or elsewhere?
Note that for Biopython 1.60 (next release) we plan to treat bad locations as a warning rather than an error that stops parsing.
Upvotes: 1