Reputation: 31
In python this code, where I directly call the function SeqIO.parse() , runs fine:
from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)
for asq in SeqIO.parse("a.fasta", "fasta"):
print("Q")
But this, where I first store the output of SeqIO.parse() in a variable(?) called a, and then try to use it in my loop, it doesn't run:
from Bio import SeqIO
a = SeqIO.parse("a.fasta", "fasta")
records = list(a)
for asq in a:
print("Q")
Is this because a the output from the function || SeqIO.parse("a.fasta", "fasta") || is being stored in 'a' differently from when I directly call it? What exactly is the identity of 'a' here. Is it a variable? Is it an object? What does the function actually return?
Upvotes: 3
Views: 1081
Reputation: 163
I have a similar issue that the parsed sequence file doesn't work inside a for-loop. Code below:
genomes_l = pd.read_csv('test_data.tsv', sep='\t', header=None, names=['anonymous_gsa_id', 'genome_id'])
# sample_f = SeqIO.parse('SAMPLE.fasta', 'fasta')
for i, r in genomes_l.iterrows():
genome_name = r['anonymous_gsa_id']
genome_ids = r['genome_id'].split(',')
genome_contigs = [rec for rec in SeqIO.parse('SAMPLE.fasta', 'fasta') if rec.id in genome_ids]
with open(f'out_dir/{genome_name}_contigs.fasta', 'w') as handle:
SeqIO.write(genome_contigs, handle, 'fasta')
Originally, I read the file in as sample_f
, however inside the loop it wouldn't work. Would appreciate any help to avoid having to read the file over and over again. Specifically the below line:
genome_contigs = [rec for rec in SeqIO.parse('SAMPLE.fasta', 'fasta') if rec.id in genome_ids]
Thank you!
Upvotes: 0
Reputation: 41168
SeqIO.parse()
returns a normal python generator. This part of the Biopython module is written in pure python:
>>> from Bio import SeqIO
>>> a = SeqIO.parse("a.fasta", "fasta")
>>> type(a)
<class 'generator'>
Once a generator is iterated over it is exhausted as you discovered. You can't rewind a generator but you can store the contents in a list
or dict
if you don't mind putting it all in memory (useful if you need random access). You can use SeqIO.to_dict(a)
to store in a dictionary with the record ids as the keys and sequences as the values. Simply re-building the generator calling SeqIO.parse()
again will avoid dumping the file contents into memory of course.
Upvotes: 5