Reputation: 65

Longest Common Subsequence Algorithm Explanation

So the psuedocode for longest common subsequence problem is listed as below.

longest-common-subsequence(s1, s2):

If the strings begin with the same letter c, the result to return is c plus the longest common subsequence between the rest of s1 and s2 (that is, s1 and s2 without their first letter). For example, the longest subsequence between "hollow" and "hello" is an "h" plus the longest subsequence found between "ollow" and "ello".

Otherwise, if the strings do not begin with the same letter, return the longer of the following two: The longest common subsequence between s1 and the rest of s2 (s2 without its first letter), The longest common subsequence between the rest of s1 (s1 without its first letter) and s2. For example, longest-common-subsequence("ollow", "ello") is the longer of longest-common-subsequence("ollow", "llo") and longest-common-subsequence("llow", "ello").

The part that I don't get is when the strings do not begin with the same letter, why do we take (s1 without the first letter, s2), (s1, s2 without the first letter). Why do we recursively go through these steps when they do not match? Is it just a set algorithm that is hard to understand? What is the reasoning behind this?

Upvotes: 2

Answers (3)

Junbang Huang

Reputation: 1977

While @yash mahajan already covered everything, I'll just provide another way to think about it.

Go through two strings, assume you are at position i on string A (of length m) and position j on string B (of length n).

1. If the current two characters of both string are the same:

longest common subsequence till now = longest common subsequence between substring A[0...i-1] and substring B[0...j-1] + 1.

2. If two characters are different:

longest common subsequence = Max(longest common subsequence between substring A[0...i-1] and string B, longest common subsequence between string A and substring B[0...j-1])

You will have a clearer idea if you read the codes.

public class Solution {

    public int longestCommonSubsequence(String A, String B) {
        if(A == null || B == null || A.length() == 0 || B.length() == 0) {
            return 0;
        }

        int m = A.length();
        int n = B.length();
        int[][] commonSubsequenceLength = new int[m + 1][n + 1];

        for(int i = 1; i <= m; i++) {
            for(int j = 1; j <= n; j++) {
                if(A.charAt(i - 1) == B.charAt(j - 1)) {
                    commonSubsequenceLength[i][j] = commonSubsequenceLength[i - 1][j - 1] + 1;
                } else {
                    commonSubsequenceLength[i][j] = Math.max(commonSubsequenceLength[i][j - 1], commonSubsequenceLength[i - 1][j]);
                }
            }
        }

        return commonSubsequenceLength[m][n];

    }
}

Upvotes: 4

arunk2

Reputation: 2426

To answer your question:

The part that I don't get is when the strings do not begin with the same letter, why do we take (s1 without the first letter, s2), (s1, s2 without the first letter). Why do we recursively go through these steps when they do not match? Is it just a set algorithm that is hard to understand? What is the reasoning behind this?

What is confusing for you is the recursive call, I guess. The whole idea is to reduce the problem to a smaller input set. In this case 1 character less at a time. You have 2 cases for the selected character(1st or last)

There is a match (example "h" in "hollow", "hello"), simply reduce the input size by 1 character in both strings and recursively call the same function.
No match. You have 2 options here - You may consider the first string has extra unwanted character or the 2nd one. Hence, make recursive call for both the cases and choose maximum of them.

Extra details: This problem has properties of a typical Dynamic Programming (DP) Problem.

1) Optimal Substructure

2) Overlapping Sub-problems

Hope it helps!

Upvotes: 0

yash mahajan

Reputation: 61

When we know that the first characters of both the strings don't match, it's clear we can't include the characters in our longest subsequence as we do it in the first case. So the obvious choice we are left with is to neglect both of these characters and search for the rest of two strings for our subsequence. But if you consider this example : "hello" and "ello" , you can clearly see that if we neglect the first characters we are basically neglecting first character of our subsequence ("Ello"). So we go for two cases: 1.We remove first character of first string and search in the second string. 2.We remove first character of second string and search in first string.

And then we take maximum of those two.

Upvotes: 5

Longest Common Subsequence Algorithm Explanation

Answers (3)

Related Questions