Reputation: 47
I have a long string of characters and not only am I trying to find if a substring of those characters exists within the larger string, I'm also trying to find the longest run of successive instances.
For example... in the code snippet below I've found that I can use "count" to see how many times the substring b appears in a. That result is 5. However, what I'm trying to identify is the longest successive run, which would be 3 (where 'abc' appears back to back to back in the middle). I'm having difficulty running through the logic of this one. Any advice would be appreciated.
a = "abcxyzabcabcabcxyzabcxyz"
b = "abc"
total = a.count(b)
print(total)
Upvotes: 2
Views: 313
Reputation: 106553
You can use re.findall
with a pattern that matches one or more times of b
(use re.escape
to prevent b
from being interpreted as regex), then map the returning strings to len
and pass them to max
to obtain the length of the longest match, and then divide that length by the length of b
to get the number of times b
is repeated:
import re
max(map(len, re.findall('(?:%s)+' % re.escape(b), a))) // len(b)
Upvotes: 0
Reputation: 114320
Keep calling a.index
on b
with the appropriate indices. If the index is the start of your subset, you're in the same run. Otherwise, start a new run:
def longest_run(string, pattern):
longest = 0
current = 0
start = 0
while True:
try:
ind = string.index(pattern, start)
if ind == start:
current += 1
else:
if current > longest:
longest = current
current = 1
start += len(pattern)
except ValueError:
return longest
Upvotes: 0
Reputation: 362716
This should be fairly simple with a while loop:
def func(a, b):
n = 1
while b*n in a:
n += 1
return n - 1
Upvotes: 1
Reputation: 637
One possible and naive solution is to use the python index
function to identify the closest index of the substring. From there you can simply continue searching ahead for the substring until you find a point where it doesnt appear anymore, then call index
again to skip ahead.
Example:
a = "abcxyzabcabcabcxyzabcxyz"
b = "abc"
curr_index = a.index(b)
longest_count = 0
current_count = 0
while curr_index < len(a):
if a[curr_index : curr_index + len(b)] == b:
curr_index += len(b)
current_count += 1
else:
if longest_count < current_count:
longest_count = current_count
try:
curr_index = a.index(b, curr_index)
except ValueError:
# Substring no longer found in string slice
break
current_count = 0
if longest_count < current_count:
longest_count = current_count
print(longest_count)
This just returns the longest repeating count, but it doesn't return the location where it starts. Adding that functionality, however, is trivial.
Upvotes: 0