RNA to PROTEIN program questions

Question

I have a few issues with my code, I'd appreciate some help.

The first part of the program is meant to validate an input from the user; so they cannot enter anything else but A U G C T (or lower case). However if I do enter anything else I get a very long error message but all I want is the program to restart the function validation check().

Also, if the user does enter a valid sequence, for some reason my code is not translating the valid RNA sequence to a protein sequence. I think maybe it has something to do with the chunks function that separates the str in input_rna into chunks of 3 letters.

import re

input_rna = input("Type RNA sequence: ")

def chunks(l, n):
    for i in range(0, len(l), n):
        yield l[i:i+n]

def translate():
    amino_acids = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
        "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
        "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
        "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
        "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
        "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
        "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
        "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
        "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
        "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
        "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
        "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
        "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
        "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
        "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
        "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}

    translated = "".join(amino_acids[i] for i in chunks("".join(input_rna), 3))

def validation_check():
    global input_rna
    if re.match(r"[A, U, G, C, T, a, u, g, c, t]", input_rna):
        print("Correct! That is a valid sequence.")
        translate()
    else:
        print("That is not a valid RNA sequence, please try again.")
        validation_check()
validation_check()

Eric Hughes · Accepted Answer

In addition to other problems pointed out, your validation_check function does not allow the user to input the string again. This means you'll keep trying to validate it over and over without ever changing it.

What you probably want to do is something more like:

def validation_check():
    input_rna = raw_input("Type RNA sequence: ").upper()
    if re.match(r"^[AUGCT]+$", input_rna):
        print("Correct! That is a valid sequence.")
        print translate(input_rna)
    else:
        print("That is not a valid RNA sequence, please try again.")
        validation_check()

This avoids using a global, allows the user to reinput, and doesn't automatically cause an infinite loop.

(Even so, using recursion here is probably bad, so you should think about implementing this as a while loop instead.)

You'll notice a couple other things:

raw_input instead of input, since the latter has an implicit eval. You want to steer clear of that unless you absolutely need it.
.upper() so you have standardized strings to verify and key off of. Since your dictionary of bases is only using uppercase strings, this makes more sense than using re.I as recommended elsewhere.
I had translate return the translated protein, and then printed it. You may want to do something else.

I also added a default to your dictionary lookup:

translated = "".join(amino_acids.get(i, '!') for i in chunks("".join(rna), 3))

This way, you can try to keep processing if you get something weird, rather than having to deal with a KeyError (which will be raised if the user inputs a sequence you don't have a key for, like 'CUT')

I also noticed you allow, but don't translate, the base 'T'. You may want to look into that.

Anyway, the complete code I wound up with is:

import re

def chunks(l, n): 
    for i in range(0, len(l), n): 
        # print i
        chunk = l[i:i+n]
        # print chunk
        yield l[i:i+n]

def translate(rna):
    amino_acids = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
        "UCU":"S", "UCC":"s", "UCA":"S", "UCG":"S",
        "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
        "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
        "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
        "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
        "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
        "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
        "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
        "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
        "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
        "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
        "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
        "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
        "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
        "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G",}
    translated = "".join(amino_acids.get(i, '!') for i in chunks("".join(rna), 3)) 
    return translated

def validation_check():
    input_rna = raw_input("Type RNA sequence: ").upper()
    if re.match(r"^[AUGCT]+$", input_rna):
        print("Correct! That is a valid sequence.")
        print translate(input_rna)
    else:
        print("That is not a valid RNA sequence, please try again.")
        validation_check()

# in case you ever need to import this, don't always call validation_check
if __name__ == "__main__":
     validation_check()

RNA to PROTEIN program questions

Answers (2)

Related Questions