Finding the consecutive substring match

Question

I have say two strings;

str1="wild animals are trying to escape the deserted jungle to the sandy island"
str2="people are trying to escape from the smoky mountain to the sandy road"

In order to find the match between these two strings, kgrams of certain length(here 10) are produced, their hashes are produced and the hashes of these two strings are compared. Say for example if the matching kgrams from these two strings are;

['aretryingt', 'etryingtoe', 'ngtoescape', 'tothesandy']

Please suggest me an efficient way of finding the consecutive substing (kgram) match from these kgrams. In the above case the actual answer would be

"aretryingtoescape"

Thanking you in advance!!!

Ignacio Vazquez-Abrams · Accepted Answer

First make yourself a coverage mask consisting of 0 and 1 (or other characters if you prefer), then find the longest run of 1s with itertools.groupby().

Finding the consecutive substring match

Answers (2)

Related Questions