Reputation: 1151
I have a FASTA file with nucleotide sequences. I would need to translate them to proteins, but taking into account 3 reading frames (i.e for +1 'ATG',+2 'TG',+3 'G'). This simple code using BioPython makes a perfect job if reading frame is +1. But for the remaining two it gives different translation. Is there a way to specify in BioPython the reading frames?
Input file
>contig20
TGGATCGGCGAGACCGACTCCGAGCGCGCCGACGTCGCCAAGGGATGGGCGTCCCTCCAGGTAAACCAACCCT
CTTCCCATCAAATTCTTTTTACCATGCAATATAGTCGTCGGTGTCGATCACTGTCATGCATATGGATTGGATT
AAACATGTCGCGGTCTCGTCGTTGCACGTTTCTTTCTTGCTTAACCACCTACCAATAGCAGCTGGTTGTAGCT
AGGTCGCTGCTGGGGATTGAAATCTTCAGCTTTAAGATGACAGCGACGACGCCATGGTCGGTCGCCCGGTCGT
GATCACCTACTCCAATTTACTGGAAAAATGATGATTTGTAAACGTGCATGCATGTTCCTTCAACCTTTTGTTA
Desired Output file
>contig20 Translated - Frame 3
DRRDRLRARRRRQGMGVPPGKPTLFPSNSFYHAI*SSVSITVMHMDWIKHVAVSSLHVSFLLNHLPIAAG
CS*VAAGD*NLQL*DDSDDAMVGRPVVITYSNLLEK**FVNVHACSFNLLL
Script
from Bio.SeqRecord import SeqRecord
def make_protein_record(nuc_record):
"""Returns a new SeqRecord with the translated sequence (default table)."""
return SeqRecord(seq = nuc_record.seq.translate(), \
id = "trans_" + nuc_record.id, \
description = "translation of CDS, using default table")
from Bio import SeqIO
proteins = (make_protein_record(nuc_rec) for nuc_rec in \
SeqIO.parse("file.fasta", "fasta"))
SeqIO.write(proteins, "translations.fasta", "fasta")
Upvotes: 1
Views: 813
Reputation: 18521
Can't you simply do
from Bio.Seq import translate
contig = 'TGGATCGGCGAGACCGACTCCGAGCGCGCCGACGTCGCCAAGGGATGGGCGTCCCTCCAGGTAAACCAACCCT'
print 'ORF 1'
print translate(contig)
print 'ORF 2'
print translate(contig[1:])
print 'ORF 3'
print translate(contig[2:])
which would yield
'ORF 1'
'WIGETDSERADVAKGWASLQVNQP'
'ORF 2'
'GSARPTPSAPTSPRDGRPSR*TNP'
'ORF 3'
'DRRDRLRARRRRQGMGVPPGKPT'
Upvotes: 1