Reputation: 189
Today, when I executed the following code, I suddenly got an error and could not execute the code Error 'FastaIterator' object has no attribute 'records'
in Biopython. I have never had any errors before, so I'm so confused.
from Bio import __version__
print('\n\nBiopython Version : ', __version__, '\n\n')
from Bio import SeqIO
seq = SeqIO.parse(concensus_path, "fasta")
for record in seq.records:
SeqIO.write(record, folder + '/' + record.name.split('(')[0].replace('_0_', '_') + '.fasta', "fasta")
The first part of the long script is to split a fasta file containing multiple dna sequences into fasta file containing a single dna sequence.
Is there any way to deal with these problems? Input fasta file has no problems at all. I tried with a file that was working fine before, but it also gave an error...
Upvotes: 0
Views: 66
Reputation: 26
According to the docs here, you can access the records by just iterating over the returned iterator:
from Bio import __version__
print('\n\nBiopython Version : ', __version__, '\n\n')
from Bio import SeqIO
for record in SeqIO.parse("example.fasta", "fasta"):
print(record.id)
From version 1.84 to 1.85:
SeqIO.parse(...)
--> <class 'Bio.SeqIO.FastaIO.FastaIterator'>
Object lost the records
attribute that I think was just unpacking the iterator in memory***.
Try installing Biopython 1.84 with pip install -v biopython==1.84
and the for an input like:
fasta_test.fasta
:
>DNA_sequence_1
GCAAAAGAACCGCCGCCACTGGTCGTGAAAGTGGTCGATCCAGTGACATCCCAGGTGTTGTTAAATTGAT
CATGGGCAGTGGCGGTGTAGGCTTGAGTACTGGCTACAACAACACTCGCACTACCCGGAGTGATAGTAAT
GCCGGTGGCGGTACCATGTACGGTGGTGAAGT
>DNA_sequence_2
TCCCAGCCAGCAGGTAGGGTCAAAACATGCAAGCCGGTGGCGATTCCGCCGACAGCATTCTCTGTAATTA
ATTGCTACCAGCGCGATTGGCGCCGCGACCAGGATCCTTTTTAACCATTTCAGAAAACCATTTGAGTCCA
TTTGAACCTCCATCTTTGTTC
>DNA_sequence_3
AACAAAAGAATTAGAGATATTTAACTCCACATTATTAAACTTGTCAATAACTATTTTTAACTTACCAGAA
AATTTCAGAATCGTTGCGAAAAATCTTGGGTATATTCAACACTGCCTGTATAACGAAACACAATAGTACT
TTAGGCTAACTAAGAAAAAACTTT
try to run:
from Bio import __version__
print('\n\nBiopython Version : ', __version__, '\n\n')
from Bio import SeqIO
import sys
concensus_path ='fasta_test.fasta'
seq = SeqIO.parse(concensus_path, "fasta")
print('\n\ntype(seq) : ', type(seq), '\n')
print('\n\nseq.records size : ', sys.getsizeof(seq.records),'\n\n')
print('\n\nseq. size : ', sys.getsizeof(seq),'\n\n')
and tell us if you see any difference
ADDENDUM:
***I was wrong seq.records returns a generator
!!!!
try add more records to the fasta_test.fasta
file and
compare the previous object size with:
seq = SeqIO.parse(concensus_path, "fasta")
recs = [i for i in seq.records]
# print(recs)
print('all records size : ' , sys.getsizeof(recs))
I think that seq.records is created in 1.84 in
biopython/Bio/SeqIO/Interfaces.py/class SequenceIterator :
....
....
try:
self.records = self.parse(self.stream)
....
....
at __init__
of class SequenceIterator
because of how FastaIterator is defined class FastaIterator(SequenceIterator)
and
SeqIO parse
method returned objects.
In 1.85 class FastaIterator(SequenceIterator)
lose its parse
method too.
in 1.84 is at line 189 :
def parse(self, handle):
"""Start parsing the file, and return a SeqRecord generator."""
records = self.iterate(handle) ## iterate is the next method defined in the class
return records
Upvotes: 1