Alex DeStefano
Alex DeStefano

Reputation: 237

Find longest repeated substring, most occurring if multiple candidates

I would like to create a python program that finds longest repeated substring, the most occurring one if there are multiple candidates. The first priority is longest, and only if there are two strings of the same length that are longest, should it go to most occurrences. The substrings are not allowed to overlap. Examples:

This is the code I tried so far:

line = "grhasdfjlksegkasdfdfshlgkjsdngbkj"
li = list(line)
c = 0
pu = 0
m = {}
for s in li:
    sta = s
    if pu == 0:
        css = c
        oln = line
        while True:
            st = sta + li[css+1]
            print(st)
            ln = line
            for i in range(pu):
                ln = ln[:css+i] + ln[(css+i+1):]
            if st in ln:
                pu += 1
                css += 1
                oln = line
                sta = st
            else:
                m[pu] = sta
                line = oln
                break
            c += 1
    else:
        pu -= 1

Upvotes: 0

Views: 89

Answers (1)

Grismar
Grismar

Reputation: 31354

Your approach could perhaps be fixed, but it seems to start from basic principles.

Why not use what Python has to offer:

def most_common_longest_recurring_substring(s):
    # this works for non-overlapping substrings, as it uses str.count()
    m, r = 0, ''  # m is the best count so far, r the string in question
    for n in range(len(s) - 1, 0, -1):
        for i in range(len(s) - n):
            if 1 < s.count(s[i:i+n]) > m:
                m = s.count(s[i:i+n])
                r = s[i:i+n]
        # once any result has been found, stop because you favour length over count
        if m > 0:
            return r


x = most_common_longest_recurring_substring('abcde')
print(x)
x = most_common_longest_recurring_substring('abcdea')
print(x)
x = most_common_longest_recurring_substring('abcdabc')
print(x)
x = most_common_longest_recurring_substring('grhasdfjlksegkasdfdfshlgkjsdngbkj')
print(x)

Output:

None
a
abc
asdf

Upvotes: 2

Related Questions