Reputation: 21
I tried to create a program to check the genome sequence.
Biologists use a sequence of letters A, C, T and G to model a genome.
A gene is a substring of a genome that starts after a triplet ATG and ends before a triplet TAG, TAA, or TGA.
Furthermore, the length of a gene string is a multiple of 3 and the gene does not contain any of the triplets ATG, TAG, TAA and TGA.
>>Enter a genome string:>>TTATGTTTTAAGGATGGGGCGTTAGTT
Output:
>>TTT
>>GGGCGT
>>Enter a genome string:>>TGTGTGTATAT
>>No gene is found
import re
def findGene(gene):
pattern = re.compile(r'ATG((?:[ACTG]{3})*?)(?:TAG|TAA|TGA)')
return pattern.findall(gene)
findGene('TTATGTTTTAAGGATGGGGCGTTAGTT')
def main():
geneinput = input("Enter a genome string: ")
print(findGene(geneinput))
main()
# TTATGTTTTAAGGATGGGGCGTTAGTT
How can I make this code work properly?
Thank you.
Upvotes: 0
Views: 262
Reputation: 21
import re
def findGene(gene):
pattern = re.compile(r'ATG((?:[ACTG]{3})*?)(?:TAG|TAA|TGA)')
return pattern.findall(gene)
findGene('TTATGTTTTAAGGATGGGGCGTTAGTT')
def main():
geneinput = input("Enter a genome string: ")
print(findGene(geneinput) or 'No gene is found')
main()
# TTATGTTTTAAGGATGGGGCGTTAGTT
Upvotes: 1