Reputation: 4281

Longest Common Substring non-DP solution with O(m*n)

The definition of the problem is:

Given two strings, find the longest common substring.

Return the length of it.

I was solving this problem and I think I solved it with O(m*n) time complexity. However I don't know why when I look up the solution, it's all talking about the optimal solution being dynamic programming - http://www.geeksforgeeks.org/longest-common-substring/

Here's my solution, you can test it here: http://www.lintcode.com/en/problem/longest-common-substring/

int longestCommonSubstring(string &A, string &B) {

    int ans = 0;
    for (int i=0; i<A.length(); i++) {
        int counter = 0;
        int k = i;
        for (int j=0; j<B.length() && k <A.length(); j++) {

            if (A[k]!=B[j]) {
                counter = 0;
                k = i;

            } else {
                k++;
                counter++;
                ans = max(ans, counter);

            }  
        }
    }

    return ans;        
}

My idea is simple, start from the first position of string A and see what's the longest substring I can match with string B, then start from the second position of string A and see what's the longest substring I can match....

Is there something wrong with my solution? Or is it not O(m*n) complexity?

Upvotes: 0

Answers (2)

Paul Hankin

Reputation: 58201

Good news: your algorithm is O(mn). Bad news: it doesn't work correctly.

Your inner loop is wrong: it's intended to find the longest initial substring of A[i:] in B, but it works like this:

j = 0
While j < len(B)
   Match as much of A[i:] against B[j:]. Call it s.
   Remember s if it's the longest so far found.
   j += len(s)

This fails to find the longest match. For example, when A = "XXY" and B = "XXXY" and i=0 it'll find "XX" as the longest match instead of the complete match "XXY".

Here's a runnable version of your code (lightly transcribed into C) that shows the faulty result:

#include <string.h>
#include <stdio.h>

int lcs(const char* A, const char* B) {
    int al = strlen(A);
    int bl = strlen(B);
    int ans = 0;
    for (int i=0; i<al; i++) {
        int counter = 0;
        int k = i;
        for (int j=0; j<bl && k<al; j++) {
            if (A[k]!=B[j]) {
                counter = 0;
                k = i;
            } else {
                k++;
                counter++;
                if (counter >= ans) ans = counter;
            }  
        }
    }
    return ans;        
}

int main(int argc, char**argv) {
    printf("%d\n", lcs("XXY", "XXXY"));
    return 0;
}

Running this program outputs "2".

Upvotes: 2

WilliamComputerScience

Reputation: 87

Your solution is O(nm) complexity and if you look compare the structure to the provided algorithm its the exact same; however, yours does not memoize.

One advantage that the dynamic algorithm provided in the link has is that in the same complexity class time it can recall different substring lengths in O(1); otherwise, it looks good to me.

This is a kind of thing will happen from time to time because storing subspace solutions will not always result in a better run time (on first call) and result in the same complexity class runtime instead (eg. try to compute the nth Fibonacci number with a dynamic solution and compare that to a tail recursive solution. Note that in this case like your case, after the array is filled the first time, its faster to return an answer each successive call.

Upvotes: 0

Longest Common Substring non-DP solution with O(m*n)

Answers (2)

Related Questions