Reputation: 154
I'm trying to write a function that gets 2 strings and an integer 'k' and returns a common substring of both the strings of length k. (If there is more than 1, it returns one at random). There are alot of algorithms online that checks for the LONGEST common substring but I found none that checks the k-length substring.
I think hash tables is the correct way to do it if I want it to be optimized but I couldn't quite get it.
I could only write a function that checks if there is more than 1 k-length sequence in list. Here is what I got:
def repeat(st, k):
for i in range(len(st) - k + 1):
for j in range(i + 1, len(st) - k + 1):
if st[i : i + k] == st[j : j + k]:
return st[i : i + k]
return False
I would appreciate ANY help with this... :/
Upvotes: 1
Views: 2912
Reputation: 97661
This is by no means an efficient or clever solution:
def substrings_of(s, k):
for i in xrange(0, len(s) - k + 1):
yield s[i:i+k]
def common_substr(a, b, k):
for a_s in substrings_of(a, k):
for b_s in substrings_of(b, k):
if a_s == b_s:
return a_s
Upvotes: 1
Reputation: 59566
Simple version is just this:
def common_substr(a, b, k):
for substr in (a[i:i+k] for i in range(len(a)-k+1)):
if substr in b:
return substr
I guess that especially for a very large input strings (e. g. megabytes of text) and large k
this might be too inefficient and building up hashes of all possible substrings of length k
can improve the speed:
def common_substr(a, b, k):
substrs = set(a[i:i+k] for i in range(len(a)-k+1))
for substr in (b[i:i+k] for i in range(len(b)-k+1)):
if substr in substrs:
return substr
But I guess that there are much cleverer algorithms around for this. Even the comparably simple strstr()
(find string in string) has more efficient solutions than the straight-forward one which everybody can implement.
Upvotes: 3