Reputation: 61
Here is my solution to the problem of rosalind project.
def prot(rna):
for i in xrange(3, (5*len(rna))//4+1, 4):
rna=rna[:i]+','+rna[i:]
rnaList=rna.split(',')
bases=['U','C','A','G']
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))
peptide=[]
for i in range (len (rnaList)):
if codon_table[rnaList[i]]=='*':
break
peptide+=[codon_table[rnaList[i]]]
output=''
for i in peptide:
output+=str(i)
return output
If I run prot('AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA')
, I get the correct output 'MAMAPRTEINSTRING'
. However if the sequence of rna (the input string) is hundreds of nucleotides (characters) long I got an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 11, in prot
KeyError: 'CUGGAAACGCAGCCGACAUUCGCUGAAGUGUAG'
Can you point me where I went wrong?
Upvotes: 0
Views: 3362
Reputation: 22827
This does not answer your question, but note that you could solve this very succinctly using BioPython:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
def rna2prot(rna):
rna = Seq(rna, IUPAC.unambiguous_rna)
return str(rna.translate(to_stop=True))
For example:
>>> print rna2prot('AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA')
MAMAPRTEINSTRING
Upvotes: 1
Reputation: 56644
Your code for breaking the rna into 3-char blocks is a bit nasty; you spend a lot of time breaking and rebuilding strings to no real purpose.
Building the codon_table only needs to be done once, not every time your function is run.
Here is a simplified version:
from itertools import product, takewhile
bases = "UCAG"
codons = ("".join(trio) for trio in product(bases, repeat=3))
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))
def prot(rna):
rna_codons = [rna[i:i+3] for i in range(0, len(rna) - 2, 3)]
aminos = takewhile(
lambda amino: amino != "*",
(codon_table[codon] for codon in rna_codons)
)
return "".join(aminos)
Upvotes: 0
Reputation: 122024
Given that you have a KeyError
, the problem must be in one of your attempts to access codon_table[rnaList[i]]
. You are assuming each item in rnalist
is three characters, but evidently, at some point, that stops being True
and one of the items is 'CUGGAAACGCAGCCGACAUUCGCUGAAGUGUAG'
.
This happens because when you reassign rna = rna[:i]+','+rna[i:]
you change the length of rna
, such that your indices i
no longer reach the end of the list. This means that for any rna
where len(rna) > 60
, the last item in the list will not have length 3. If there is a stop codon before you reach the item it isn't a problem, but if you reach it you get the KeyError
.
I suggest you rewrite the start of your function, e.g. using the grouper
recipe from itertools
:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
def prot(rna):
rnaList = ["".join(t) for t in grouper(rna, 3)]
...
Note also that you can use
peptide.append(codon_table[rnaList[i]])
and
return "".join(peptide)
to simplify your code.
Upvotes: 2