Reputation:
I have a few issues with my code, I'd appreciate some help.
The first part of the program is meant to validate an input from the user; so they cannot enter anything else but A U G C T (or lower case). However if I do enter anything else I get a very long error message but all I want is the program to restart the function validation check().
Also, if the user does enter a valid sequence, for some reason my code is not translating the valid RNA sequence to a protein sequence. I think maybe it has something to do with the chunks function that separates the str in input_rna into chunks of 3 letters.
import re
input_rna = input("Type RNA sequence: ")
def chunks(l, n):
for i in range(0, len(l), n):
yield l[i:i+n]
def translate():
amino_acids = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
"UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
"UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
"UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
"CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
"CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
"CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
"CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
"AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
"ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
"AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
"AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
"GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
"GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
"GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
"GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}
translated = "".join(amino_acids[i] for i in chunks("".join(input_rna), 3))
def validation_check():
global input_rna
if re.match(r"[A, U, G, C, T, a, u, g, c, t]", input_rna):
print("Correct! That is a valid sequence.")
translate()
else:
print("That is not a valid RNA sequence, please try again.")
validation_check()
validation_check()
Upvotes: 0
Views: 520
Reputation: 841
In addition to other problems pointed out, your validation_check
function does not allow the user to input the string again. This means you'll keep trying to validate it over and over without ever changing it.
What you probably want to do is something more like:
def validation_check():
input_rna = raw_input("Type RNA sequence: ").upper()
if re.match(r"^[AUGCT]+$", input_rna):
print("Correct! That is a valid sequence.")
print translate(input_rna)
else:
print("That is not a valid RNA sequence, please try again.")
validation_check()
This avoids using a global, allows the user to reinput, and doesn't automatically cause an infinite loop.
(Even so, using recursion here is probably bad, so you should think about implementing this as a while
loop instead.)
You'll notice a couple other things:
raw_input
instead of input
, since the latter has an implicit eval
. You want to steer clear of that unless you absolutely need it..upper()
so you have standardized strings to verify and key off of. Since your dictionary of bases is only using uppercase strings, this makes more sense than using re.I
as recommended elsewhere.translate
return the translated protein, and then printed it. You may want to do something else.I also added a default to your dictionary lookup:
translated = "".join(amino_acids.get(i, '!') for i in chunks("".join(rna), 3))
This way, you can try to keep processing if you get something weird, rather than having to deal with a KeyError
(which will be raised if the user inputs a sequence you don't have a key for, like 'CUT'
)
I also noticed you allow, but don't translate, the base 'T'
. You may want to look into that.
Anyway, the complete code I wound up with is:
import re
def chunks(l, n):
for i in range(0, len(l), n):
# print i
chunk = l[i:i+n]
# print chunk
yield l[i:i+n]
def translate(rna):
amino_acids = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
"UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
"UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
"UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
"CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
"CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
"CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
"CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
"AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
"ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
"AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
"AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
"GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
"GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
"GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
"GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}
translated = "".join(amino_acids.get(i, '!') for i in chunks("".join(rna), 3))
return translated
def validation_check():
input_rna = raw_input("Type RNA sequence: ").upper()
if re.match(r"^[AUGCT]+$", input_rna):
print("Correct! That is a valid sequence.")
print translate(input_rna)
else:
print("That is not a valid RNA sequence, please try again.")
validation_check()
# in case you ever need to import this, don't always call validation_check
if __name__ == "__main__":
validation_check()
Upvotes: 3
Reputation: 8174
regular expression is wrong, try:
if re.match(r"^[AUGCT]+$", input_rna, re.IGNORECASE):
following is better, because in RNA the Uracil instead of Thyamine ...
if re.match(r"^[AUGC]+$", input_rna, re.IGNORECASE):
note: algorithm translation has problem, also
list(chunks("".join(input_rna), 3))
you get:
['ACG', 'AUG', 'AGU', 'CAU', 'GCU', 'U']
problem in last by "ACGAUGAGUCAUGCUU", if length not is multple of 3
solution:
"".join(amino_acids[i] for i in chunks("".join(input_rna), 3) if len(i)==3)
Upvotes: 3