Reputation: 99
I have a file in fasta format with several DNA sequences. I want to change the content of each sequence for another smaller sequence, keeping the same sequence id. The new sequences are in a list.
with open("outfile.fa", "w") as f:
for seq_record in SeqIO.parse("ma-all-mito.fa", "fasta"):
for i in range(len(newSequences_ok)):
f.write(str(seq_record.id[i]) + "\n")
f.write(str(newSequences_ok[i]) + "\n")
But I get:
IndexError: string index out of range
How could I change the code so that it works? I think the problem is that I need to iterate both through the original fasta file and through the list with the new sequences.
The original fasta file looks like this:
>Sequence1
ATGATGCATGG
>Sequence2
TTTTGGGAATC
>Sequence3
GGGCTAACTAC
>Sequence4
ATCTCAGGAA
And the list with the new sequences is similar to this one:
newSequences_ok=[ATGG,TTTC,GGTA,CTCG]
The output that I would like to get is:
>Sequence1
ATGG
>Sequence2
TTTC
>Sequence3
GGTA
>Sequence4
CTCG
Upvotes: 1
Views: 1446
Reputation: 51998
I think that this might work:
records = SeqIO.parse("ma-all-mito.fa", "fasta")
with open("outfile.fa", "w") as f:
for r, s in zip(records,newSequences_ok):
f.write(r.seq.seq.split('\n')[0] + '\n')
f.write(s + '\n')
If not (and even if it does) -- you really need to read up on how Biopython works. You were treating SeqIO.parse
as something which directly returns the lines of the files. Instead, it returns SeqRecord
objects which have a seq
attribute which returns Seq
objects which themselves have two attributes, a seq
attribute (which is what you seem to want) and an alphabet
attribute. You should concentrate on being able to extract the information that you are interested in before you try to modify it.
Upvotes: 1