Translating a FASTA file of CDS to proteins taking into account open reading frames

Question

I have a FASTA file with nucleotide sequences. I would need to translate them to proteins, but taking into account 3 reading frames (i.e for +1 'ATG',+2 'TG',+3 'G'). This simple code using BioPython makes a perfect job if reading frame is +1. But for the remaining two it gives different translation. Is there a way to specify in BioPython the reading frames?

Input file

>contig20
TGGATCGGCGAGACCGACTCCGAGCGCGCCGACGTCGCCAAGGGATGGGCGTCCCTCCAGGTAAACCAACCCT
CTTCCCATCAAATTCTTTTTACCATGCAATATAGTCGTCGGTGTCGATCACTGTCATGCATATGGATTGGATT
AAACATGTCGCGGTCTCGTCGTTGCACGTTTCTTTCTTGCTTAACCACCTACCAATAGCAGCTGGTTGTAGCT
AGGTCGCTGCTGGGGATTGAAATCTTCAGCTTTAAGATGACAGCGACGACGCCATGGTCGGTCGCCCGGTCGT
GATCACCTACTCCAATTTACTGGAAAAATGATGATTTGTAAACGTGCATGCATGTTCCTTCAACCTTTTGTTA

Desired Output file

>contig20 Translated - Frame 3
DRRDRLRARRRRQGMGVPPGKPTLFPSNSFYHAI*SSVSITVMHMDWIKHVAVSSLHVSFLLNHLPIAAG
CS*VAAGD*NLQL*DDSDDAMVGRPVVITYSNLLEK**FVNVHACSFNLLL

Script

from Bio.SeqRecord import SeqRecord
def make_protein_record(nuc_record):
"""Returns a new SeqRecord with the translated sequence (default table)."""
    return SeqRecord(seq = nuc_record.seq.translate(), \
                 id = "trans_" + nuc_record.id, \
                 description = "translation of CDS, using default table")

from Bio import SeqIO
proteins = (make_protein_record(nuc_rec) for nuc_rec in \
        SeqIO.parse("file.fasta", "fasta"))
SeqIO.write(proteins, "translations.fasta", "fasta")

Translating a FASTA file of CDS to proteins taking into account open reading frames

Answers (1)

Related Questions