Matthew D
Matthew D

Reputation: 154

Common substring of length k

I'm trying to write a function that gets 2 strings and an integer 'k' and returns a common substring of both the strings of length k. (If there is more than 1, it returns one at random). There are alot of algorithms online that checks for the LONGEST common substring but I found none that checks the k-length substring.

I think hash tables is the correct way to do it if I want it to be optimized but I couldn't quite get it.

I could only write a function that checks if there is more than 1 k-length sequence in list. Here is what I got:

def repeat(st, k):
    for i in range(len(st) - k + 1):
        for j in range(i + 1, len(st) - k + 1):
            if st[i : i + k] == st[j : j + k]:
                return st[i : i + k]
    return False

I would appreciate ANY help with this... :/

Upvotes: 1

Views: 2912

Answers (2)

Eric
Eric

Reputation: 97661

This is by no means an efficient or clever solution:

def substrings_of(s, k):
    for i in xrange(0, len(s) - k + 1):
        yield s[i:i+k]

def common_substr(a, b, k):
    for a_s in substrings_of(a, k):
        for b_s in substrings_of(b, k):
            if a_s == b_s:
                return a_s

Upvotes: 1

Alfe
Alfe

Reputation: 59566

Simple version is just this:

def common_substr(a, b, k):
  for substr in (a[i:i+k] for i in range(len(a)-k+1)):
    if substr in b:
      return substr

I guess that especially for a very large input strings (e. g. megabytes of text) and large k this might be too inefficient and building up hashes of all possible substrings of length k can improve the speed:

def common_substr(a, b, k):
  substrs = set(a[i:i+k] for i in range(len(a)-k+1))
  for substr in (b[i:i+k] for i in range(len(b)-k+1)):
    if substr in substrs:
      return substr

But I guess that there are much cleverer algorithms around for this. Even the comparably simple strstr() (find string in string) has more efficient solutions than the straight-forward one which everybody can implement.

Upvotes: 3

Related Questions