Henrique Baesa
Henrique Baesa

Reputation: 25

Finding patterns on strings on Python

My function was suposed to receive a large string, go through it, and find the maximum number of times the pattern "AGATC" repeats consecutively. Regardless of what I feed this function, my return is always 1.

def agatc(s):
    maxrep = 0
    temp = 0
    for i in range(len(s) - 4):
        if s[i] == "A" and s[i + 1] == "G" and s[i + 2] == "A" and s[i + 3] == "T" and s[i + 4] == "C":
            temp += 1
            print(i)
            i += 3
        else:
            if temp > maxrep:
                maxrep = temp
            temp = 0
    return maxrep

Also tried initializing the for loop with (0, len(s) - 4, 1), got the same return.

I though the problem might be in adding 3 to the i variable (apparently it wasn't), so I added print(i) to see what was happening. I got the following:

45
1938
2049
2195
2952
2957
2962
2967
2972
2977
2982
2987
2992
2997
3002
3007
3012
3017
3022
3689
4754

Upvotes: 1

Views: 91

Answers (4)

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

In this way you can find the number of overlapping matches:

def agatc(s):
    temp = 0
    for i in range(len(s) - len("AGATC") + 1):
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
    return temp

If you want to find non-overlapping matches:

def agatc(s):
    temp = 0
    i = 0
    while i < len(s) - len("AGATC") + 1:
        if s[i:i+len("AGATC")] == "AGATC":
            temp += 1
            i += len("AGATC")
        else:
            i += 1
    return temp

Upvotes: 3

Red
Red

Reputation: 27557

This function counts the greatest amount of consecutive 'AGATC's in a string and returns the amount:

def agatc(s):
    w = "AGATC"
    maxrep = [m.start() for m in re.finditer(w,s)] # The beginning index fror each AGATC
    c = ''
    for i,v in enumerate(maxrep):
        if i < len(maxrep)-1:
            if v+5 == maxrep[i+1]:
                c+='y'
            else:
                c+='n'

    return len(max(c.split('n')))+1

print(agatc("oooooooooAGATCooooAGATCAGATCAGATCAGATCooooooAGATCAGATC"))

Output:

4

Upvotes: 0

Sơn Ninh
Sơn Ninh

Reputation: 331

A simple solution with module re

import re

s = 'FGHAGATCATCFJSFAGATCAGATCFHGH'
match = re.finditer('(?P<name>AGATC)+', s)
max_len = 0
result = tuple()
for m in match:
    l = m.end() - m.start()
    if l > max_len:
        max_len = l
        result = (m.start(), m.end())

print(result)

Upvotes: 1

Ronald
Ronald

Reputation: 3305

Personally I would use regular expressions. But if you do not want that, you could use the str.find() method. Here is my solution:

def agatc(s):
    cnt = 0
    findstr='aga'                             # pattern you are looking for
    for i in range(len(s)):
        index = s.find(findstr)
        if index != -1:
            cnt+=1
            s = s[index+1:]                   # overlapping matches
            # s = s[index+len(findstr):]      # non-overlapping matches only
            print(index, s)                   # just to see what happens
    return cnt

Upvotes: 0

Related Questions