Marc
Marc

Reputation: 127

Is this a valid Genbank feature description or a Biopython bug?

I stumbled upon a Genbank-formatted file (shown here as a minimal dummy example), which contains a nested feature like this:

FEATURES             Location/Qualifiers
     xxxx_domain     complement(complement(1..145))

Such a feature crashes the current Biopython Genbank parser (1.59 release), but it apparently did not in former releases (e.g. 1.55). Apparently the behaviour was already in 1.57 (see comment below).

From the Biopython bugtracker, it seems that the old locationparser code got removed in 1.56:

From what I could deduce from the format description on ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt and http://www.insdc.org/documents/feature_table.html#3.4.2 this is most likely invalid. (but see comment below).

Can someone comment on this. I.e. is this a glitch in Biopython or in the format of the Genbank file?

A full demo file:

LOCUS       XXXXXXXXXXXXXX         240 bp    DNA     circular     17-JAN-2012
DEFINITION  xxxxxx.
KEYWORDS    xx.
SOURCE      
  ORGANISM  
FEATURES             Location/Qualifiers
     xxxx_domain     complement(complement(1..145))
                     /vntifkey="1"
                     /label=A label
                     /note="A note"
BASE COUNT       75 a        57 c        42 g        66 t 
ORIGIN
        1 tttacaaaac gcattttcaa accttgggta ctaccccctt ttaaatatcc gaatacacta 
       61 ataaacgctc tttcctttta ggtaaacccg ccaatatata ctgatacaca ctgatagttt 
      121 aaactagatg cagtggccga ccatcagatc tagtaggaaa cagctatgac catgattacg 
      181 cattacttat ttaagatcaa ccgtaccagt ataccctgcc agcatgatgg aaacctccct 
//

A minimum demo program to show the error (assumes Biopython 1.59 and Python 2.7 are installed and the above mentioned file is available as "test.gb":

#!/usr/bin/env python
from Bio import SeqIO
s = SeqIO.read(open("test.gb")), "r"), "genbank")

This crashes with

    raise LocationParserError(location_line)
Bio.GenBank.LocationParserError: complement(1..145)

Upvotes: 2

Views: 334

Answers (1)

Peter Cock
Peter Cock

Reputation: 1614

I believe that is an invalid location. Was this from an NCBI file, or elsewhere?

Note that for Biopython 1.60 (next release) we plan to treat bad locations as a warning rather than an error that stops parsing.

Upvotes: 1

Related Questions