Lucas
Lucas

Reputation: 1177

Python/Biopython. Get enumerated list of sequences matching words after parsing file with protein sequences

In Python/Biopython, I am trying to get an enumerated list of protein sequences that match the string "Human adenovirus". The problem with the code below is that I get the enumeration of the sequences to be parsed, but not of those which pass the if loop filter.

EDITED CODE with proper syntax:

from Bio import SeqIO
import sys  
sys.stdout = open("out_file.txt","w")

for index, seq_record in enumerate(SeqIO.parse("in_file.txt", "fasta")):
    if "Human adenovirus" in seq_record.description:

        print "%i]" % index, str(seq_record.description) 
        print str(seq_record.seq) + "\n"

This is a piece of the input file:

>gi|927348286|gb|ALE15299.1| penton [Bottlenose dolphin adenovirus 1]
MQRPQQTPPPPYESVVEPLYVPSRYLAPSEGRNSIRYSQLPPLYD

>gi|15485528|emb|CAC67483.1| penton [Human adenovirus 2]
MQRAAMYEEGPPPSYESVVSAAPVAAALGSPFDAPLDPPFVPPRYLRPTGGRNSIRYSELAPLFDTTRVY
LVDNKSTDVASLNYQNDHSNFLTTVIQNNDY

>gi|1194445857|dbj|BAX56610.1| fiber, partial [Human mastadenovirus C]
FNPVYPYDTETGPPTVPFLTPPFVSPNG

The output file I get looks like this:

2] gi|15485528|emb|CAC67483.1| penton [Human adenovirus 2]
MQRAAMYEEGPPPSYESVVSAAPVAAALGSPFDAPLDPPFVPPRYLRPTGGRNSIRYSELAPLFDTTRVY
LVDNKSTDVASLNYQNDHSNFLTTVIQNNDY

I would like the first sequence that pass the filter to get the enumeration starting with 1], not with 2] as it is shown before. I know I need to somehow add a counter after the if loop, but I have tried many alternatives and I do not get the desired output. This should not be difficult, I know how to do it in Perl but not with Python/Biopython.

Upvotes: 0

Views: 196

Answers (1)

imolit
imolit

Reputation: 8332

The issue is that you only want to increment the index if the description contains "Human adenovirus", but you are enumerating everything.

If we modify your code sample to only increment the index when a match is found, we get this:

from Bio import SeqIO
index = 0
with open("out_file.txt","w") as f:
    for seq_record in SeqIO.parse("in_file.txt", "fasta"):
        if "Human adenovirus" in seq_record.description:
            index += 1
            print "%i]" % index, str(seq_record.description) 
            print str(seq_record.seq) + "\n"

Btw, why are you opening a file for writing, but never writing to it?

Upvotes: 2

Related Questions