Common substring of length k

Question

I'm trying to write a function that gets 2 strings and an integer 'k' and returns a common substring of both the strings of length k. (If there is more than 1, it returns one at random). There are alot of algorithms online that checks for the LONGEST common substring but I found none that checks the k-length substring.

I think hash tables is the correct way to do it if I want it to be optimized but I couldn't quite get it.

I could only write a function that checks if there is more than 1 k-length sequence in list. Here is what I got:

def repeat(st, k):
    for i in range(len(st) - k + 1):
        for j in range(i + 1, len(st) - k + 1):
            if st[i : i + k] == st[j : j + k]:
                return st[i : i + k]
    return False

I would appreciate ANY help with this... :/

Alfe · Accepted Answer

Simple version is just this:

def common_substr(a, b, k):
  for substr in (a[i:i+k] for i in range(len(a)-k+1)):
    if substr in b:
      return substr

I guess that especially for a very large input strings (e. g. megabytes of text) and large k this might be too inefficient and building up hashes of all possible substrings of length k can improve the speed:

def common_substr(a, b, k):
  substrs = set(a[i:i+k] for i in range(len(a)-k+1))
  for substr in (b[i:i+k] for i in range(len(b)-k+1)):
    if substr in substrs:
      return substr

But I guess that there are much cleverer algorithms around for this. Even the comparably simple strstr() (find string in string) has more efficient solutions than the straight-forward one which everybody can implement.

Common substring of length k

Answers (2)

Related Questions