rickel_pick
rickel_pick

Reputation: 47

finding the most amount of times a substring appears successively in a string

I have a long string of characters and not only am I trying to find if a substring of those characters exists within the larger string, I'm also trying to find the longest run of successive instances.

For example... in the code snippet below I've found that I can use "count" to see how many times the substring b appears in a. That result is 5. However, what I'm trying to identify is the longest successive run, which would be 3 (where 'abc' appears back to back to back in the middle). I'm having difficulty running through the logic of this one. Any advice would be appreciated.

a = "abcxyzabcabcabcxyzabcxyz"

b = "abc"

total = a.count(b)

print(total)

Upvotes: 2

Views: 313

Answers (4)

blhsing
blhsing

Reputation: 106553

You can use re.findall with a pattern that matches one or more times of b (use re.escape to prevent b from being interpreted as regex), then map the returning strings to len and pass them to max to obtain the length of the longest match, and then divide that length by the length of b to get the number of times b is repeated:

import re
max(map(len, re.findall('(?:%s)+' % re.escape(b), a))) // len(b)

Upvotes: 0

Mad Physicist
Mad Physicist

Reputation: 114320

Keep calling a.index on b with the appropriate indices. If the index is the start of your subset, you're in the same run. Otherwise, start a new run:

def longest_run(string, pattern):
    longest = 0
    current = 0
    start = 0
    while True:
        try:
            ind = string.index(pattern, start)
            if ind == start:
                current += 1
            else:
                if current > longest:
                    longest = current
                current = 1
            start += len(pattern)
        except ValueError:
            return longest

Upvotes: 0

wim
wim

Reputation: 362716

This should be fairly simple with a while loop:

def func(a, b): 
    n = 1 
    while b*n in a: 
        n += 1 
    return n - 1 

Upvotes: 1

ashiswin
ashiswin

Reputation: 637

One possible and naive solution is to use the python index function to identify the closest index of the substring. From there you can simply continue searching ahead for the substring until you find a point where it doesnt appear anymore, then call index again to skip ahead.

Example:

a = "abcxyzabcabcabcxyzabcxyz"
b = "abc"

curr_index = a.index(b)
longest_count = 0
current_count = 0

while curr_index < len(a):
    if a[curr_index : curr_index + len(b)] == b:
        curr_index += len(b)
        current_count += 1
    else:
        if longest_count < current_count:
            longest_count = current_count
        try:
            curr_index = a.index(b, curr_index)
        except ValueError:
            # Substring no longer found in string slice
            break
        current_count = 0

if longest_count < current_count:
    longest_count = current_count

print(longest_count)

This just returns the longest repeating count, but it doesn't return the location where it starts. Adding that functionality, however, is trivial.

Upvotes: 0

Related Questions