liya77
liya77

Reputation: 121

Determining if a sequence is a valid DNA sequence

I'm attempting to write this program that reads in a sequence into a string variable, called sequence, and finds out if sequence contains a valid DNA sequence or not. I want to use a single for and one if-elif-elsestatement to determine whether the sequence is valid DNA or not.
This is what I have written so far:

sequence = input("Please enter a sequence: ").upper()
valid_dna = "ACGT"
sequence = sequence.replace(" ", "")

common=0
for eachletter in sequence:
    if eachletter in valid_dna:
        common +=1

print("This is a valid dna sequence")

elif sequence != valid_dna:
    print("This is not a valid DNA sequence")

else:
    print()

I don't know what to add after elif, because what I added after elif it returns Syntax error.

I originally had

sequence = input().upper()
sequence= input("Please enter a sequence:  ")

which didn't work well together, thank you to VHarisop for pointing it out!

Update: This is what I have now, and it works!

sequence = input().upper()
valid_dna = "ACGT"
sequence = sequence.replace(" ", "")

for i in sequence:
    if i in valid_dna:
            count = 1
    else:
            count=0
if count==1:
    print("This is a valid DNA sequence.") 
else:
    print("This is an invalid DNA sequence")

Upvotes: 1

Views: 10746

Answers (3)

Mohanad
Mohanad

Reputation: 11

def is_valid_sequence(dna):
    char_invalid = ''
    for char in dna:
        if char not in 'ATCG':
            char_invalid = char_invalid + char
    return not bool (char_invalid)

Upvotes: -1

Cory Kramer
Cory Kramer

Reputation: 117856

I would just use all and a generator expression

>>> valid = 'ACTG'

>>> s1 = 'ATAGCGGCAT'
>>> all(i in valid for i in s1)
True

>>> s2 = 'ABCDEFHI'
>>> all(i in valid for i in s2)
False

If you have to use a for loop and if statements because this is a homework requirement, you can use a similar idea

def validSequence(s):
    valid = 'ACTG'
    for letter in s:
        if letter not in valid:
            return False
    return True

>>> validSequence('ATAGCGGCAT')
True
>>> validSequence('ABCDEFHIJK')
False

Upvotes: 6

VHarisop
VHarisop

Reputation: 2826

First of all, you have:

sequence = input().upper()
# irrelevant code
sequence= input("Please enter a sequence:  ")

This will ask for input two times, turning everything you type to uppercase the first time and leaving it untouched the second, which will obviously result in erroneous behaviour. I would recommend keeping only:

sequence = input('Please enter a sequence: ').upper()

and then using a generator expression to check validity.

Actually, there is no need to keep a separate string for non-valid characters. Just do:

valid_dna = 'ACGT'
sequence = input('Please enter a sequence: ').upper()

# will print True if every character in the sequence belongs to valid_dna
print(all(i in valid_dna for i in sequence))

Here, the generator expression (i in valid_dna for i in sequence) will return True for every character of the sequence that belongs to valid_dna and False for every character that does not. The built-in function any() will return True only if every value generated by the expression is True.

If you want a proper message, you can simply check the return value of the expression and print accordingly:

condition = all(i in valid_dna for i in sequence)
print('Valid sequence') if condition else print('Invalid sequence')

Upvotes: 1

Related Questions