Raj
Raj

Reputation: 21

Make basic genome sequence program work properly

I tried to create a program to check the genome sequence.

Context:

Biologists use a sequence of letters A, C, T and G to model a genome.
A gene is a substring of a genome that starts after a triplet ATG and ends before a triplet TAG, TAA, or TGA.
Furthermore, the length of a gene string is a multiple of 3 and the gene does not contain any of the triplets ATG, TAG, TAA and TGA.

My desired result is:

>>Enter a genome string:>>TTATGTTTTAAGGATGGGGCGTTAGTT
Output:
>>TTT
>>GGGCGT
>>Enter a genome string:>>TGTGTGTATAT
>>No gene is found

So far I have got:

import re

def findGene(gene):
  pattern = re.compile(r'ATG((?:[ACTG]{3})*?)(?:TAG|TAA|TGA)')
  return pattern.findall(gene)

  findGene('TTATGTTTTAAGGATGGGGCGTTAGTT')

def main():
  geneinput = input("Enter a genome string: ")
  print(findGene(geneinput))


main()

# TTATGTTTTAAGGATGGGGCGTTAGTT

How can I make this code work properly?

Thank you.

Upvotes: 0

Views: 262

Answers (1)

Raj
Raj

Reputation: 21

import re

def findGene(gene):
    pattern = re.compile(r'ATG((?:[ACTG]{3})*?)(?:TAG|TAA|TGA)')
    return pattern.findall(gene)

findGene('TTATGTTTTAAGGATGGGGCGTTAGTT')

def main():
    geneinput = input("Enter a genome string: ")
    print(findGene(geneinput) or 'No gene is found')


main()

# TTATGTTTTAAGGATGGGGCGTTAGTT

Upvotes: 1

Related Questions