peianwu
peianwu

Reputation: 35

Python: Compare two strings and return the longest segment that they have in common

As a novice in Python, I have written a working function that will compare two strings and search for the longest substring shared by both strings. For instance, when the function compares "goggle" and "google", it will identify "go" and "gle" as the two common substrings (excluding single letters), but will only return "gle" since it's the longest one.

I would like to know if anywhere part of my code can be improved/re-written, as it may be considered lengthy and convoluted. I'll also be very glad to see other approaches to the solution. Thanks in advance!

def longsub(string1, string2):
    sublist = []
    i=j=a=b=count=length=0

    while i < len(string1):
        while j < len(string2):
            if string1[i:a+1] == string2[j:b+1] and (a+1) <= len(string1) and (b+1) <= len(string2):
                a+=1
                b+=1
                count+=1
            else:
                if count > 0:
                    sublist.append(string1[i:a])
                count = 0
                j+=1
                b=j
                a=i
        j=b=0
        i+=1
        a=i

    while len(sublist) > 1:
        for each in sublist:
            if len(each) >= length:
                length = len(each)
            else:
                sublist.remove(each)

    return sublist[0]

Edit: Comparing "goggle" and "google" may have been a bad example, since they are equal length with longest common segments in the same positions. The actual inputs would be closer to this: "xabcdkejp" and "zkdieaboabcd". Correct output should be "abcd".

Upvotes: 1

Views: 5593

Answers (2)

Sami N
Sami N

Reputation: 1180

EDIT: This algorithm only works when the words have the longest segment in the same indices

You can get away with only one loop. Use helper variables. Something like these (needs refactoring) http://codepad.org/qErRBPav:

word1 = "google"
word2 = "goggle"

longestSegment = ""
tempSegment = ""

for i in range(len(word1)):
    if word1[i] == word2[i]:
        tempSegment += word1[i]
    else: tempSegment = ""

    if len(tempSegment) > len(longestSegment):
        longestSegment = tempSegment

print longestSegment # "gle"

EDIT: mgilson's proposal of using find_longest_match (works for varying positions of the segments):

from difflib import SequenceMatcher

word1 = "google"
word2 = "goggle"

s = SequenceMatcher(None, word1, word2)
match = s.find_longest_match(0, len(word1), 0, len(word2))

print word1[match.a:(match.b+match.size)] # "gle"

Upvotes: 2

mgilson
mgilson

Reputation: 310307

There actually happens to be a function for this in the standard library: difflib.SequencMatcher.find_longest_match

Upvotes: 4

Related Questions