Sheldoor
Sheldoor

Reputation: 71

Check if a string has a palindrome subsequence of length k

The string S consists of lowercase English letters. I want to know whether this string contains a palindrome subsequence of length exactly k or not. I want a dynamic programming algorithm that runs in O(k^2) time. Example inputs and outputs:

input: S= "abacd", K = 3 - output = True

input: S= "abacd", K = 4 - output = False

Upvotes: 1

Views: 216

Answers (1)

btilly
btilly

Reputation: 46408

Yes, this is doable in O(k^2).

The first non-obvious trick is that the lower case English alphabet has only 26 characters. So if 25k < n then some character has k of the same character. Therefore we can ignore n because n is O(k).

Next, let's walk through analyzing a slightly more complex string. Like characters = "aabacdbdb", looking for a palindromic subsequence of length 5.

The first step is to find out where the characters are.

by_char = {}
for i, char in enumerate(characters):
    if char not in by_char:
        by_char[char] = []
    by_char[char].append(i)
    if k <= len(by_char[char]):
        # We have a subseq of just char.
        return True

With a fixed sized alphabet, this never does more than O(k) work. With the above string it will give:

by_char = {'a': [0, 1, 3], 'b': [2, 6, 8], 'c': [4], 'd': [5, 7]}

Now let's try to build a palindromic sequence of length 5. We will move the start from left to right. Then use by_char to figure out what kinds of symmetric sequences we form. We're always looking for the sequence ending as far right as possible of any given length.

enumerate("aabacdbdb") gives:

    (0, "a"):
        - by_char["a"] = [0, 1, 3]
          symmetric sequence:
              - length 1 ending at 3
                (add middle, palindrome length 3)

    (1, "a"):
        - by_char["a"] = [0, 1, 3]
          symmetric sequence:
              - length 1 ending at 3
                (add middle, palindrome length 3)

    (2, "b"):
        - by_char["b"] = [2, 6, 8]
          symmetric sequence:
              - length 1 ending at 8 (improved!)
                (add middle, palindrome length 3)

    (3, "a"):
        - by_char["a"] = [0, 1, 3]
          symmetric sequence:
              - length 1 ending at 8
                (add middle, palindrome length 3)

    (4, "c"):
        - by_char["c"] = [4]
          symmetric sequence:
              - length 1 ending at 8
                (add middle, palindrome length 3)

    (5, "d"):
        - by_char["d"] = [5, 7]
          symmetric sequence:
              - length 1 ending at 8
                (add middle, palindrome length 3)
              - length 2 ending at 7
                (add middle, palindrome length 5)
                FOUND IT!!

In this case we could always add the middle. But that only works if there is at least one character between the start and end of the two symmetric sequences. So there had to be a check.

And that is the idea of the dynamic programming algorithm. Here is working code based on exactly this idea.

def has_palindrome_subseq(characters, k):
    # Trivial cases.
    if 0 == k:
        return True

    # Find out where the characters are.
    by_char = {}
    for i, char in enumerate(characters):
        if char not in by_char:
            by_char[char] = []
        by_char[char].append(i)
        if k <= len(by_char[char]):
            # We have a subseq of just char.
            # Note that we always hit this if 25k < n
            return True

    # Keep track of where start will be in by_char["char"]
    # Just saves us having to look for it next time.
    char_idx = {}
    for char in by_char.keys():
        char_idx[char] = 0

    # This is where the current best symmetry is found
    # with a sequence of length ? going no later than
    # start ending as far to the right as possible.
    #
    # The 0 entry is a hack to simplify later logic.
    # Any 1 length symmetry will be accepted, because it is
    # not off of the end of the string.
    best_symmetry_end = [len(characters)]

    # Now let's move along the characters.
    for i, char in enumerate(characters):
        # Did we start reach the longest symmetry_end?
        # If so, get rid of it.
        if best_symmetry_end[-1] <= i:
            best_symmetry_end.pop()

        # Find where we are in by_char[char]    
        idx = char_idx[char]
        char_idx[char] += 1 # For the next time we hit char

        # Move to the next place we find char    
        idx += 1
        # Can we find a longer symmetry? Find the longest!
        while idx + 1 < len(by_char[char]) and by_char[char][idx+1] < best_symmetry_end[-1]:
            idx += 1

        # Did we find a longer symmetric pair?    
        if idx < len(by_char[char]) and by_char[char][idx] < best_symmetry_end[-1]:
            best_symmetry_end.append(by_char[char][idx])
            # Length of the symmetric sequence found.
            length = 2 * len(best_symmetry_end) - 2

            # Maybe add a character in the middle for 1 more?
            if i + 1 < best_symmetry_end[-1]:
                length += 1
            if k <= length:
                return True

        # Try to improve entries in our best_symmetry_end
        j = len(best_symmetry_end) - 1

        # We will always increase idx or decrease j.
        # Each has at most O(k) things, so this inner
        # loop is at most O(k).
        while idx < len(by_char[char]) and 0 < j:
            # Does combining this char pair and the next shortest
            # symmetries get us a symmetry of this length?
            if by_char[char][idx] < best_symmetry_end[j-1]:
                # Find the best we can do with a pair of this char
                while idx+1 < len(by_char[char]) and by_char[char][idx+1] < best_symmetry_end[j-1]:
                    idx += 1
                # Is this a new best?
                if best_symmetry_end[j] < by_char[char][idx]:
                    best_symmetry_end[j] = by_char[char][idx]
                # Either way move along.
                idx += 1
                j -= 1
            else:
                # Can't do this length, move shorter.
                j -= 1

    # We've completed our search and didn't find it.
    return False

Now the complexity analysis. If finding all of the character positions doesn't return in time O(k), then n must be O(k). After that, for up to n characters, we scan through at most O(k) choices of (idx, j). For O(k^2) total work.

Upvotes: 2

Related Questions