Reputation: 237
I would like to create a python program that finds longest repeated substring, the most occurring one if there are multiple candidates. The first priority is longest, and only if there are two strings of the same length that are longest, should it go to most occurrences. The substrings are not allowed to overlap. Examples:
If there is a string of 4 characters that occurs twice, and a string of three characters that occurs three times, it would return the string of 4 characters.
If there is a string of 4 characters that occurs twice, and a string of 4 characters that occurs twice, it would return the one that occurs twice.
If there are no patterns that occur more than once, it would return the original string.
If you have the string "grhasdfjlksegkasdfdfshlgkjsdngbkj" it would return "asdf", because that is the substring that occurs most often.
This is the code I tried so far:
line = "grhasdfjlksegkasdfdfshlgkjsdngbkj"
li = list(line)
c = 0
pu = 0
m = {}
for s in li:
sta = s
if pu == 0:
css = c
oln = line
while True:
st = sta + li[css+1]
print(st)
ln = line
for i in range(pu):
ln = ln[:css+i] + ln[(css+i+1):]
if st in ln:
pu += 1
css += 1
oln = line
sta = st
else:
m[pu] = sta
line = oln
break
c += 1
else:
pu -= 1
Upvotes: 0
Views: 89
Reputation: 31354
Your approach could perhaps be fixed, but it seems to start from basic principles.
Why not use what Python has to offer:
def most_common_longest_recurring_substring(s):
# this works for non-overlapping substrings, as it uses str.count()
m, r = 0, '' # m is the best count so far, r the string in question
for n in range(len(s) - 1, 0, -1):
for i in range(len(s) - n):
if 1 < s.count(s[i:i+n]) > m:
m = s.count(s[i:i+n])
r = s[i:i+n]
# once any result has been found, stop because you favour length over count
if m > 0:
return r
x = most_common_longest_recurring_substring('abcde')
print(x)
x = most_common_longest_recurring_substring('abcdea')
print(x)
x = most_common_longest_recurring_substring('abcdabc')
print(x)
x = most_common_longest_recurring_substring('grhasdfjlksegkasdfdfshlgkjsdngbkj')
print(x)
Output:
None
a
abc
asdf
Upvotes: 2