search for pattern in string

I need to find a pattern ATG[any number of any characters triplets][TGA or TAG or TGA], where I need only the first ATG, further up to [TGA or TAG or TAA] do not matter.

And this should be interrupted at [TGA or TAG or TAA]. In string there could be several such, they do not need to overlap.

For example search on 'ATGcccATGgggTAGgATGtttTAA' should give 'ATGcccATGgggTAG' and 'ATGtttTAA' as a result.

Is there any way to do this in Python?

Upvotes: 0

Views: 95

Answers (2)

Rene
Rene

Reputation: 43

I'm not a pro, so there might be nicer and/or more effective solutions, but this does the trick:

s = 'ATGcccATGgggTAGgATGtttTAA'
start = 'ATG'
stop = ['TGA', 'TAG', 'TGA', 'TAA', 'TGG']
temp = ''
results = []
i = 0
while i < len(s):
    if s[i].isupper():
        temp = s[i:i+3]
        if temp == start:
            for j in range(3,len(s)-i):
                if s[i+j].isupper():
                    temp = s[i+j:i+j+3]
                    if temp in stop:
                        temp = s[i:i+j+3]
                        i += j+3
                        results.append(temp)
                        break
    else:
        i += 1
print results

Upvotes: 1

Daniel Roseman
Daniel Roseman

Reputation: 600026

This is a job for a regex. (Note that your expected result does not seem to match your specification; you originally say you want to match up to TGA, TAG or TGA, but then in the result you match up to TAA. I'll assume the end of the string is meant to be TGA.)

import re
target = 'ATGcccATGgggTAGgATGtttTGA'
results = re.findall(r'(ATG.*?(?:TAG|TGA|TGA))', target)
# ['ATGcccATGgggTAG', 'ATGtttTGA']

Upvotes: 1

Related Questions