newa123
newa123

Reputation: 99

Change DNA sequences in fasta file using Biopython

I have a file in fasta format with several DNA sequences. I want to change the content of each sequence for another smaller sequence, keeping the same sequence id. The new sequences are in a list.

with open("outfile.fa", "w") as f:
    for seq_record in SeqIO.parse("ma-all-mito.fa", "fasta"):
        for i in range(len(newSequences_ok)):
            f.write(str(seq_record.id[i]) + "\n")
            f.write(str(newSequences_ok[i]) + "\n")  

But I get:

IndexError: string index out of range

How could I change the code so that it works? I think the problem is that I need to iterate both through the original fasta file and through the list with the new sequences.

The original fasta file looks like this:

>Sequence1
ATGATGCATGG
>Sequence2
TTTTGGGAATC
>Sequence3
GGGCTAACTAC
>Sequence4
ATCTCAGGAA

And the list with the new sequences is similar to this one:

newSequences_ok=[ATGG,TTTC,GGTA,CTCG]

The output that I would like to get is:

>Sequence1
ATGG
>Sequence2
TTTC
>Sequence3
GGTA
>Sequence4
CTCG

Upvotes: 1

Views: 1446

Answers (1)

John Coleman
John Coleman

Reputation: 51998

I think that this might work:

records = SeqIO.parse("ma-all-mito.fa", "fasta")
with open("outfile.fa", "w") as f:
    for r, s in zip(records,newSequences_ok):
        f.write(r.seq.seq.split('\n')[0] + '\n')
        f.write(s + '\n')

If not (and even if it does) -- you really need to read up on how Biopython works. You were treating SeqIO.parse as something which directly returns the lines of the files. Instead, it returns SeqRecord objects which have a seq attribute which returns Seq objects which themselves have two attributes, a seq attribute (which is what you seem to want) and an alphabet attribute. You should concentrate on being able to extract the information that you are interested in before you try to modify it.

Upvotes: 1

Related Questions