Chief C
Chief C

Reputation: 51

How can you check for specific characters in a string?

When I run the program it always prints true. For example, if I enter AAJJ it will print true because is only checking if the first letter is true. can someone point me in the right direction? Thanks!

squence_str = raw_input("Enter either A DNA, Protein or RNA sequence:")

def DnaCheck():

    for i in (squence_str):
        if string.upper(i) =="A":
            return True
        elif string.upper(i) == "T":
            return True
        elif string.upper(i) == "C":
            return True
        elif string.upper(i) == "G":
            return True
        else:
            return False

print "DNA ", DnaCheck()

Upvotes: 4

Views: 1789

Answers (4)

Alexander
Alexander

Reputation: 109526

You need to check that all of the bases in the DNA sequence are valid.

def DnaCheck(sequence):
    dna = set('ACTG')
    return all(base.upper() in dna for base in sequence)

all(...) uses a generator expression to iterate over all the nucleotides in the given DNA sequence, converting each into UPPER case and checking if it is contained in the DNA set {'A', 'C', 'T', 'G'}. If any value is not in this set, the function immediately returns False without processing the remaining characters in sequence, otherwise the function returns True once all characters have been processed and each is in the set.

For example, the sequence "axctgACTGACT" would return False after only processing the first two characters in the sequence, as "x" converted to the uppercase "X" is not in the DNA set {'A','C', 'T', 'G'} and thus the remaining characters in the sequence don't need to be checked.

Upvotes: 8

Gurjot Singh Sidhu
Gurjot Singh Sidhu

Reputation: 3

Check out this picture to see how this function works!

def DnaCheck(sequence):
    for base in sequence:
        if base.upper() in ('A', 'C', 'G', 'T'):
            continue
        else:
            print('False')
            return
    print('True')

As shown in the image above, we start with iterating through each base in a given sequence using a for loop. if a given base belongs to ('A', 'C', 'G', 'T') set (green signal), function will continue to the next base and check that (go back to beginning of for loop, without running the subsequent code). It will continue to check the subsequent bases unless it meets a base which doesn't meet criteria (red signal), at which point else statement will be executed to print 'False' and function will terminate using return (print('True') will not be executed). In case of valid sequence, after checking the last base, for loop will end and print('True') will be executed.

Upvotes: 0

Hugh Bothwell
Hugh Bothwell

Reputation: 56634

I like @Alexander's answer, but for variety you could see if

def dna_check(sequence):
    return set(sequence.upper()).issubset("ACGT")
    # another possibility:
    # return set(sequence).issubset("ACGTacgt")

might be faster on long sequences, especially if the odds of being a legal sequence are good (ie most of the time you will have to iterate over the whole sequence anyway).

Upvotes: 2

Prune
Prune

Reputation: 77837

More at the level of your current learning ...

You have the logic reversed. You have to check all the positions. If any one of them fails to identify as a nucleotide in "ACTG", then you immediately return False for the string. Only when you've passed all of the characters, can you confidently return True.

import string

def DnaCheck(squence_str):

    for i in (squence_str):
        if string.upper(i) not in "ACTG":
            return False

    return True

test_cases = ["", "AAJJ", "ACTG", "AACTGTCAA", "AACTGTCAX"]
for strand in test_cases:
    print strand, DnaCheck(strand)

Output:

 True
AAJJ False
ACTG True
AACTGTCAA True
AACTGTCAX False

Upvotes: 0

Related Questions