Reputation: 35
As a novice in Python, I have written a working function that will compare two strings and search for the longest substring shared by both strings. For instance, when the function compares "goggle" and "google", it will identify "go" and "gle" as the two common substrings (excluding single letters), but will only return "gle" since it's the longest one.
I would like to know if anywhere part of my code can be improved/re-written, as it may be considered lengthy and convoluted. I'll also be very glad to see other approaches to the solution. Thanks in advance!
def longsub(string1, string2):
sublist = []
i=j=a=b=count=length=0
while i < len(string1):
while j < len(string2):
if string1[i:a+1] == string2[j:b+1] and (a+1) <= len(string1) and (b+1) <= len(string2):
a+=1
b+=1
count+=1
else:
if count > 0:
sublist.append(string1[i:a])
count = 0
j+=1
b=j
a=i
j=b=0
i+=1
a=i
while len(sublist) > 1:
for each in sublist:
if len(each) >= length:
length = len(each)
else:
sublist.remove(each)
return sublist[0]
Edit: Comparing "goggle" and "google" may have been a bad example, since they are equal length with longest common segments in the same positions. The actual inputs would be closer to this: "xabcdkejp" and "zkdieaboabcd". Correct output should be "abcd".
Upvotes: 1
Views: 5593
Reputation: 1180
EDIT: This algorithm only works when the words have the longest segment in the same indices
You can get away with only one loop. Use helper variables. Something like these (needs refactoring) http://codepad.org/qErRBPav:
word1 = "google"
word2 = "goggle"
longestSegment = ""
tempSegment = ""
for i in range(len(word1)):
if word1[i] == word2[i]:
tempSegment += word1[i]
else: tempSegment = ""
if len(tempSegment) > len(longestSegment):
longestSegment = tempSegment
print longestSegment # "gle"
EDIT: mgilson's proposal of using find_longest_match
(works for varying positions of the segments):
from difflib import SequenceMatcher
word1 = "google"
word2 = "goggle"
s = SequenceMatcher(None, word1, word2)
match = s.find_longest_match(0, len(word1), 0, len(word2))
print word1[match.a:(match.b+match.size)] # "gle"
Upvotes: 2
Reputation: 310307
There actually happens to be a function for this in the standard library: difflib.SequencMatcher.find_longest_match
Upvotes: 4