user1998510
user1998510

Reputation: 75

Biopython : Looping over a dictionary full of fasta sequences, while simulatenously renaming the headers

in biopython , you have an option to make a multi sequence fasta file into a dictionary.

handle = open ("file.fasta","r")
records = SeqIO.parse(handle, "fasta")
records_dict = SeqIO.to_dict(records)
handle.close() 

I have another file ( csv) with a list of sequence identifiers in the first column and a corresponding header line in the second column.

import csv
descriptions = open("identifier_mapper.csv","r")
data = csv.reader(identifier_mapper,delimiter = ',')

For example :

SBNP0002Q39M,Artificial Sequence 1
SBNP0004AJIU,Artificial Sequence 2
SBNP0004AJIV,Artificial Sequence 3
SBNP0004AJHM,Artificial Sequence 4

In my fasta file, all of the sequences have the sequence key in the header. I can use string parsing to get hold of them.

My question: For some reason when I try to loop over the dictionary and the csv reader, it always seems to terminate after checking the first sequence in the list! No idea why?!

for m in records_dict:
    sequence_key1 = (records_dict[m].id).split("|")[1]
    for row in data:
        sequence_key = row[0]
        organism= row[1]
        if sequence_key1 == sequence_key:
              print ">" + sequence_key + organism 
              print records_dict[m].seq

Any help would be appreciated!

Upvotes: 1

Views: 366

Answers (1)

Jose Ricardo Bustos M.
Jose Ricardo Bustos M.

Reputation: 8164

You must change this line (covert generator to list)

identifier_mapper = open("identifier_mapper.csv","r")
data = list(csv.reader(identifier_mapper,delimiter = ','))

csv.reader return a generator in first iteration this arrive at StopIteration, for that reason only shows the first sequence in the dictionary

Upvotes: 1

Related Questions