Reputation: 75
in biopython , you have an option to make a multi sequence fasta file into a dictionary.
handle = open ("file.fasta","r")
records = SeqIO.parse(handle, "fasta")
records_dict = SeqIO.to_dict(records)
handle.close()
I have another file ( csv) with a list of sequence identifiers in the first column and a corresponding header line in the second column.
import csv
descriptions = open("identifier_mapper.csv","r")
data = csv.reader(identifier_mapper,delimiter = ',')
For example :
SBNP0002Q39M,Artificial Sequence 1
SBNP0004AJIU,Artificial Sequence 2
SBNP0004AJIV,Artificial Sequence 3
SBNP0004AJHM,Artificial Sequence 4
In my fasta file, all of the sequences have the sequence key in the header. I can use string parsing to get hold of them.
My question: For some reason when I try to loop over the dictionary and the csv reader, it always seems to terminate after checking the first sequence in the list! No idea why?!
for m in records_dict:
sequence_key1 = (records_dict[m].id).split("|")[1]
for row in data:
sequence_key = row[0]
organism= row[1]
if sequence_key1 == sequence_key:
print ">" + sequence_key + organism
print records_dict[m].seq
Any help would be appreciated!
Upvotes: 1
Views: 366
Reputation: 8164
You must change this line (covert generator to list)
identifier_mapper = open("identifier_mapper.csv","r")
data = list(csv.reader(identifier_mapper,delimiter = ','))
csv.reader
return a generator in first iteration this arrive at StopIteration
, for that reason only shows the first sequence in the dictionary
Upvotes: 1