Reputation: 1177
In Python/Biopython, I am trying to get an enumerated list of protein sequences that match the string "Human adenovirus". The problem with the code below is that I get the enumeration of the sequences to be parsed, but not of those which pass the if loop filter.
EDITED CODE with proper syntax:
from Bio import SeqIO
import sys
sys.stdout = open("out_file.txt","w")
for index, seq_record in enumerate(SeqIO.parse("in_file.txt", "fasta")):
if "Human adenovirus" in seq_record.description:
print "%i]" % index, str(seq_record.description)
print str(seq_record.seq) + "\n"
This is a piece of the input file:
>gi|927348286|gb|ALE15299.1| penton [Bottlenose dolphin adenovirus 1]
MQRPQQTPPPPYESVVEPLYVPSRYLAPSEGRNSIRYSQLPPLYD
>gi|15485528|emb|CAC67483.1| penton [Human adenovirus 2]
MQRAAMYEEGPPPSYESVVSAAPVAAALGSPFDAPLDPPFVPPRYLRPTGGRNSIRYSELAPLFDTTRVY
LVDNKSTDVASLNYQNDHSNFLTTVIQNNDY
>gi|1194445857|dbj|BAX56610.1| fiber, partial [Human mastadenovirus C]
FNPVYPYDTETGPPTVPFLTPPFVSPNG
The output file I get looks like this:
2] gi|15485528|emb|CAC67483.1| penton [Human adenovirus 2]
MQRAAMYEEGPPPSYESVVSAAPVAAALGSPFDAPLDPPFVPPRYLRPTGGRNSIRYSELAPLFDTTRVY
LVDNKSTDVASLNYQNDHSNFLTTVIQNNDY
I would like the first sequence that pass the filter to get the enumeration starting with 1], not with 2] as it is shown before. I know I need to somehow add a counter after the if loop, but I have tried many alternatives and I do not get the desired output. This should not be difficult, I know how to do it in Perl but not with Python/Biopython.
Upvotes: 0
Views: 196
Reputation: 8332
The issue is that you only want to increment the index if the description contains "Human adenovirus", but you are enumerating everything.
If we modify your code sample to only increment the index when a match is found, we get this:
from Bio import SeqIO
index = 0
with open("out_file.txt","w") as f:
for seq_record in SeqIO.parse("in_file.txt", "fasta"):
if "Human adenovirus" in seq_record.description:
index += 1
print "%i]" % index, str(seq_record.description)
print str(seq_record.seq) + "\n"
Btw, why are you opening a file for writing, but never writing to it?
Upvotes: 2