discoverAnkit
discoverAnkit

Reputation: 1151

Total number of palindromic subsequences in a string

The question is like this--

For every string given as input, you need to tell the number of subsequences of it that are palindromes (need not necessarily be distinct). Note that the empty string is not a palindrome. For example, the palindromic subsequences of "aab" are:

"a", "a", "b", "aa", and the method returns 4.

I had the Dynamic Programming solution to finding Longest Palindromic Subsequence in mind and therefore tried to take ideas from it. Couldn't really get the solution. May be dynamic programming is not even required. Suggestions please.

And there is one more catch. When the condition "need not necessarily be distinct" is removed, can we still count without actually generating all the palindromic subsequences?

Upvotes: 7

Views: 8343

Answers (3)

Vinay Malik
Vinay Malik

Reputation: 1

Intuitive O(n^3) solution using DP:

Let each state dp(i,j) represents number of palindromic subsequences in string[i...j] Then simple recursive formula is

for k in range i, j-1:
    if(A[j]==A[k]){
        dp(i,j) = dp(i,j) + dp(k+1,j-1);

The Idea is very simple.. For adding a new character check if it is end of a subsequence or not. If there exist same character in the previously computed smaller subproblem, then it add the number of subsequences contained in range (k+1,j-1). Just take care of corner cases. Add one as newly added character is a single character subsequence too. Even if there are no subsequences in the range (k+1,j-1) , you would still get 1 new subsequences of length 2 (like "aa").

Upvotes: 0

j_random_hacker
j_random_hacker

Reputation: 51226

[EDIT 19/10/2015: An anonymous reviewer pointed out a problem with the formula, which prompted me to notice another, even bigger mistake... Now fixed.]

I now see how to drop the solution time down to O(n^2). I'll leave my other answer up in case it's interesting as a stepping-stone to this one. Note: This is (also) only a solution to the first part of the problem; I see no way to efficiently count only distinct palindromic subsequences (PS).

Instead of counting the number of PS that begin and end at exactly the positions i and j, let's count how many begin at or after i and end at or before j. Call this g(i, j).

We can try to write g(i, j) = g(i, j-1) + g(i+1, j) + (x[i] == x[j])*g(i+1, j-1) for the case when j > i. But this doesn't quite work, because the first two terms will double-count any PS that begin after i and end before j.

The key insight is to notice that we can easily calculate the number of PS that begin or end at some exact position by subtracting off other values of g(), and perhaps adding yet more values of g() back on to compensate for double-counting. For example, the number of PS that begin at exactly i and end at exactly j is g(i, j) - g(i+1, j) - g(i, j-1) + g(i+1, j-1): the last term corrects for the fact that both the second and third terms count all g(i+1, j-1) PS that begin after i and end before j.

Every PS that begins at or after i and ends at or before j is in exactly 1 of 4 categories:

  1. It begins after i, and ends before j.
  2. It begins at i, and ends before j.
  3. It begins after i, and ends at j.
  4. It begins at i, and ends at j.

g(i+1, j) counts all PS in category 1 or 3, and g(i, j-1) counts all PS in category 1 or 2, so their sum g(i+1, j) + g(i, j-1) counts all PS in category 2 or 3 once each, and all PS in category 1 twice. Since g(i+1, j-1) counts all PS in category 1 only, subtracting this off to get g(i+1, j) + g(i, j-1) - g(i+1, j-1) gives the total number of PS in category 1, 2 and 3. The remaining PS are those in category 4. If x[i] != x[j] then there are no PS in this category; otherwise, there are exactly as many as there are PS that begin at or after i+1 and end at or before j-1, namely g(i+1, j-1), plus one extra for the 2-character sequence x[i]x[j]. [EDIT: Thanks to commenter Tuxdude for 2 fixes here!]

With this in hand, we can express g() in a way that changes the quadratic case from f() to constant time:

g(i, i) = 1 (i.e. when j = i)
g(i, i+1) = 2 + (x[i] == x[i+1]) (i.e. 3 iff adjacent chars are identical, otherwise 2)
g(i, j) = 0 when j < i (this new boundary case is needed)
g(i, j) = g(i+1, j) + g(i, j-1) - g(i+1, j-1) + (x[i] == x[j])*(g(i+1, j-1)+1) when j >= i+2

The final answer is now simply g(1, n).

Upvotes: 13

j_random_hacker
j_random_hacker

Reputation: 51226

Here's a horrible O(n^4) solution:

Every palindromic subsequence begins at some position i and ends at some position j >= i such that x[i] = x[j], and its "interior" (all characters except the first and last) is either empty or a palindromic subsequence of x[i+1 .. j-1].

So we can define f(i, j) to be the number of palindromic subsequences beginning at i and ending at j >= i. Then

f(i, j) = 0 if x[i] != x[j]
f(i, i) = 1 (i.e. when j = i)
f(i, j) = 1 + the sum of f(i', j') over all i < i' <= j' < j otherwise

[EDIT: Fixed to count palindromic subsequences of length <= 2 too!]

Then the final answer is the sum of f(i, j) over all 1 <= i <= j <= n.

The DP for this is O(n^4) because there are n^2 table entries, and computing each one takes O(n^2) time. (It's probably possible to speed this up to at least O(n^3) by making use of the fact that x[i] != x[j] implies f(i, j) = 0.)

Upvotes: 1

Related Questions