Reputation: 11
I'm trying to read a list of XML files generated through multiple sequence submissions to the NCBI blast website. From each file, I want to print certain lines of information.
The files I want to read are all given the suffix "_recombination.xml"
.
for file in glob.glob("*_recombination.xml"):
result_handle= open(file)
blast_record=NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
The script first finds all the files with the "_recombination.xml"
suffix, I then want it to read each file, and print certain lines (this is pretty much a straight copy from the BioPython cook book), which it seems to do. But I get the following error:
Traceback (most recent call last):
File "Scripts/blast_test.py", line 202, in <module>
blast_record=NCBIXML.read(result_handle)
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 576, in read
first = iterator.next()
File "/Library/Python/2.7/site-packages/Bio/Blast/NCBIXML.py", line 643, in parse
expat_parser.Parse("", True) # End of XML record
xml.parsers.expat.ExpatError: no element found: line 3106, column 7594
I'm not really sure what the problem is. I'm not sure if it's trying to loop back through the files it's already read- for example, closing the files seems to help:
for file in glob.glob("*_recombination.xml"):
result_handle= open(file)
blast_record=NCBIXML.read(result_handle)
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
result_handle.close()
blast_record.close()
But it also gives me another error:
Traceback (most recent call last):
File "Scripts/blast_test.py", line 213, in <module> blast_record.close()
AttributeError: 'Blast' object has no attribute 'close'
Upvotes: 1
Views: 1556
Reputation: 1
I would add a comment to Biogeek answer but I cannot (not enough reputation yet). In deed he is right, you should use
NCBIXML.parse(open(input_xml))
instead of NCBIXML.read(open(input_xml)) because you are "trying to read a list of XML files" and for lists of XML files you need to parse and not read. Did it fix your problem?
Upvotes: 0
Reputation: 41
I usually use the parse method instead of read, maybe it could help you:
for blast_record in NCBIXML.parse(open(input_xml)):
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print "*****Alignment****"
print "sequence:", alignment.title
print "length:", alignment.length
print "e-value:", hsp.expect
print hsp.query
print hsp.match
print hsp.sbjct
and be sure that your xmls are generated with -outfmt 5 in your query blast
Upvotes: 3