Ashb
Ashb

Reputation: 11

Why does the if statement get bypassed?

I am trying to figure out why the second string of "peptide_seq" bypasses the first if statement? The three strings are supposed to go through the if statements and return the following statements:

Convert the string to uppercase.

  • Strings that contain any character that is not A C G T or U should return: "not an unambiguous nucleotide".
  • Strings that contain ACGU should return "DNA".
  • Strings that contain ACGT should return "RNA".
def nuc_ac_check(string):
    input_string = string.upper()

    if ('d') in input_string:
        return "not an unambiguous nucleic acid"
    elif ('A' and 'C' and 'G' and 'U') in input_string:
        return "Rna"
    elif ('A' and 'C' and 'G' and 'T') in input_string:
        return "Dna"

Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU"
peptide_seq= "acgsdtushnsdses"
Dna_seq = "ggatacgatc"

print ('the rna seq variable is: ' + nuc_ac_check(Rna_seq))
print ('the peptide seq variable is: ' + nuc_ac_check(peptide_seq))
print ('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))

Upvotes: 1

Views: 197

Answers (5)

VirtualVDX
VirtualVDX

Reputation: 2381

First condition should be fixed in following way:

 if ('D') in input_string:
    return "not an unambigious nucleic acid"

Notice the upper case 'D' in the condition.

Update: Full code, using Holloway and rassar comments: Update 1: Full code, using Holloway and rassar, and Copperfield comments:

def nuc_ac_check(string):
    input_string = string.upper()

    rna_letter = ['A', 'C', 'G', 'U']
    dna_letter = ['A', 'C', 'G', 'T']

    if ('D') in input_string:
        return "not an unambigious nucleic acid"
    elif all(letter in rna_letter for letter in input_string):
        return "Rna"
    elif all(letter in dna_letter for letter in input_string):
        return "Dna"
    return "Nothing of everything above"

Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU"
peptide_seq= "acgsdtushnsdses"
Dna_seq = "ggatacgatc"
weird_string = "AGCTEXF"

print ('the rna seq variable is: ' + nuc_ac_check(Rna_seq))
print ('the peptide seq variable is: ' + nuc_ac_check(peptide_seq))
print ('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))
print ('the weird string variable is: ' + nuc_ac_check(weird_string))

Upvotes: 2

Copperfield
Copperfield

Reputation: 8520

just asking if the string have "d" or "D" in it is not good enough to rule out a invalid string, because it may contain "E", "Z", "X", "1", "@" or any other possible character, the best solution for this in my opinion is to use a set which will reduce anything you give to it to only its distinct elements

for example

>>> set("111122222222ddddddddRRRRRRRDDDDD")
{'D', '1', 'd', '2', 'R'}
>>>  

then you only need to compare against the set of valid option to see if there is any match

RNA = set("ACGU")
DNA = set("ACGT")

def nuc_ac_check(string):
    d_r_na = set(string.upper())

    if d_r_na == DNA:
        return "DNA"
    elif d_r_na == RNA:
        return "RNA"
    else:
        return "not an unambiguous nucleotide"


Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU"
peptide_seq= "acgsdtushnsdses"
Dna_seq = "ggatacgatc"

print('the rna seq variable is: ' + nuc_ac_check(Rna_seq))
print('the peptide seq variable is: ' + nuc_ac_check(peptide_seq))
print('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))

and the output is

the rna seq variable is: RNA
the peptide seq variable is: not an unambiguous nucleotide
the Dna seq variable is: DNA

Upvotes: 0

Steve
Steve

Reputation: 1292

I'm not certain what you are trying to do, but this may do the job:

dna_set = set("ATGC")
rna_set = set("AUGC")
nucleotide_set = set("ATGCU")

def nuc_ac_check(string):
    str_set = set(string.upper())

    if str_set - nucleotide_set != set():
        return "Not an unambiguous nucleic acid"
    if str_set == rna_set:
        return "RNA"
    if str_set == dna_set:
        return "DNA"
    return "Not DNA or RNA"

Upvotes: 0

rassar
rassar

Reputation: 5680

You can't say:

elif ('A' and 'C' and 'G' and 'T') in input_string:

You have to say:

elif 'A' in input_string and 'C' in input_string and 'G' in input_string and 'T' in input_string:

Or you could make it more concise:

elif all(letter in input_string for letter in ('A', 'C', 'G', 'T')):

So try this:

if all(letter in input_string for letter in ('A', 'C', 'G', 'U')):
    return "Rna"
elif all(letter in input_string for letter in ('A', 'C', 'G', 'T')):
    return "Dna"
else:
    return "not an unambiguous nucleic acid"

Upvotes: 0

Nishu Tayal
Nishu Tayal

Reputation: 20880

.upper() will convert everything in the UPPERCASE. And other conditions like ('A' and 'C' and 'G' and 'T') will always give the one char, so you need to update the if clauses. You should try something like this.

def containsAny(seq, aset):
    """ Check whether sequence seq contains ANY of the items in aset. """
    for c in seq:
        if c in aset: return True
    return False

def nuc_ac_check(string):
    input_string = string.upper() # update it as per your need

    if ('d') in input_string:
        return "not an unambigious nucleic acid"
    elif containsAny(input_string,['A','C','G','U']):
        return "Rna"
    elif containsAny(input_string,['A','C','G','T']):
        return "Dna"

Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU" peptide_seq= "acgsdtushnsdses" Dna_seq = "ggatacgatc"

print ('the rna seq variable is: ' + nuc_ac_check(Rna_seq)) print ('the peptide seq variable is: ' + nuc_ac_check(peptide_seq)) print ('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))

Upvotes: 0

Related Questions