Reputation: 11
I am trying to figure out why the second string of "peptide_seq" bypasses the first if
statement? The three strings are supposed to go through the if
statements and return the following statements:
Convert the string to uppercase.
- Strings that contain any character that is not A C G T or U should return: "not an unambiguous nucleotide".
- Strings that contain ACGU should return "DNA".
- Strings that contain ACGT should return "RNA".
def nuc_ac_check(string):
input_string = string.upper()
if ('d') in input_string:
return "not an unambiguous nucleic acid"
elif ('A' and 'C' and 'G' and 'U') in input_string:
return "Rna"
elif ('A' and 'C' and 'G' and 'T') in input_string:
return "Dna"
Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU"
peptide_seq= "acgsdtushnsdses"
Dna_seq = "ggatacgatc"
print ('the rna seq variable is: ' + nuc_ac_check(Rna_seq))
print ('the peptide seq variable is: ' + nuc_ac_check(peptide_seq))
print ('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))
Upvotes: 1
Views: 197
Reputation: 2381
First condition should be fixed in following way:
if ('D') in input_string:
return "not an unambigious nucleic acid"
Notice the upper case 'D' in the condition.
Update: Full code, using Holloway and rassar comments: Update 1: Full code, using Holloway and rassar, and Copperfield comments:
def nuc_ac_check(string):
input_string = string.upper()
rna_letter = ['A', 'C', 'G', 'U']
dna_letter = ['A', 'C', 'G', 'T']
if ('D') in input_string:
return "not an unambigious nucleic acid"
elif all(letter in rna_letter for letter in input_string):
return "Rna"
elif all(letter in dna_letter for letter in input_string):
return "Dna"
return "Nothing of everything above"
Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU"
peptide_seq= "acgsdtushnsdses"
Dna_seq = "ggatacgatc"
weird_string = "AGCTEXF"
print ('the rna seq variable is: ' + nuc_ac_check(Rna_seq))
print ('the peptide seq variable is: ' + nuc_ac_check(peptide_seq))
print ('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))
print ('the weird string variable is: ' + nuc_ac_check(weird_string))
Upvotes: 2
Reputation: 8520
just asking if the string have "d"
or "D"
in it is not good enough to rule out a invalid string, because it may contain "E"
, "Z"
, "X"
, "1"
, "@"
or any other possible character, the best solution for this in my opinion is to use a set which will reduce anything you give to it to only its distinct elements
for example
>>> set("111122222222ddddddddRRRRRRRDDDDD")
{'D', '1', 'd', '2', 'R'}
>>>
then you only need to compare against the set of valid option to see if there is any match
RNA = set("ACGU")
DNA = set("ACGT")
def nuc_ac_check(string):
d_r_na = set(string.upper())
if d_r_na == DNA:
return "DNA"
elif d_r_na == RNA:
return "RNA"
else:
return "not an unambiguous nucleotide"
Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU"
peptide_seq= "acgsdtushnsdses"
Dna_seq = "ggatacgatc"
print('the rna seq variable is: ' + nuc_ac_check(Rna_seq))
print('the peptide seq variable is: ' + nuc_ac_check(peptide_seq))
print('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))
and the output is
the rna seq variable is: RNA
the peptide seq variable is: not an unambiguous nucleotide
the Dna seq variable is: DNA
Upvotes: 0
Reputation: 1292
I'm not certain what you are trying to do, but this may do the job:
dna_set = set("ATGC")
rna_set = set("AUGC")
nucleotide_set = set("ATGCU")
def nuc_ac_check(string):
str_set = set(string.upper())
if str_set - nucleotide_set != set():
return "Not an unambiguous nucleic acid"
if str_set == rna_set:
return "RNA"
if str_set == dna_set:
return "DNA"
return "Not DNA or RNA"
Upvotes: 0
Reputation: 5680
You can't say:
elif ('A' and 'C' and 'G' and 'T') in input_string:
You have to say:
elif 'A' in input_string and 'C' in input_string and 'G' in input_string and 'T' in input_string:
Or you could make it more concise:
elif all(letter in input_string for letter in ('A', 'C', 'G', 'T')):
So try this:
if all(letter in input_string for letter in ('A', 'C', 'G', 'U')):
return "Rna"
elif all(letter in input_string for letter in ('A', 'C', 'G', 'T')):
return "Dna"
else:
return "not an unambiguous nucleic acid"
Upvotes: 0
Reputation: 20880
.upper() will convert everything in the UPPERCASE. And other conditions like ('A' and 'C' and 'G' and 'T')
will always give the one char, so you need to update the if clauses.
You should try something like this.
def containsAny(seq, aset):
""" Check whether sequence seq contains ANY of the items in aset. """
for c in seq:
if c in aset: return True
return False
def nuc_ac_check(string):
input_string = string.upper() # update it as per your need
if ('d') in input_string:
return "not an unambigious nucleic acid"
elif containsAny(input_string,['A','C','G','U']):
return "Rna"
elif containsAny(input_string,['A','C','G','T']):
return "Dna"
Rna_seq = "GGUACGGCUUGGUAUCCCACUCAGUGGCACCUGUGGCCU" peptide_seq= "acgsdtushnsdses" Dna_seq = "ggatacgatc"
print ('the rna seq variable is: ' + nuc_ac_check(Rna_seq)) print ('the peptide seq variable is: ' + nuc_ac_check(peptide_seq)) print ('the Dna seq variable is: ' + nuc_ac_check(Dna_seq))
Upvotes: 0