Learner
Learner

Reputation: 21445

Count how many palindromes in a string

I want to find out all possible palindromes that can be possible using substrings from a given string.

Example: input = abbcbbd.
Possible palindromes are a,b,b,c,b,b,d,bb,bcb, bbcbb,bb

Here is the logic I have implemented:

public int palindromeCount(String input) {
    int size = input.length();// all single characters in string are treated as palindromes.
    int count = size;
    for(int i=0; i<size; i++) {
        for(int j=i+2; j<=size; j++) {
            String value = input.substring(i, j);
            String reverse = new StringBuilder(value).reverse().toString();
            if(value.equals(reverse)) {
                count++;
            }
        }
    }
    return count;
}

Here the time complexity is more, how can I improve the performance of this logic?

Upvotes: 5

Views: 10337

Answers (2)

templatetypedef
templatetypedef

Reputation: 373462

If you're comfortable busting out some heavyweight data structures, it's possible to do this in time O(n), though I'll admit that this isn't something that will be particularly easy to code up. :-)

We're going to need two tools in order to solve this problem.

Tool One: Generalized Suffix Trees. A generalized suffix tree is data structure that, intuitively, is a trie containing all suffixes of two strings S and T, but represented in a more space-efficient way.

Tool Two: Lowest Common Ancestor Queries. A lowest common ancestor query structure (or LCA query) is a data structure built around a particular tree. It's designed to efficiently answer queries of the form "given two nodes in the tree, what is their lowest common ancestor?"

Importantly, a generalized suffix tree for two strings of length m can be built in time O(m), and an LCA query can be built in time O(m) such that all queries take time O(1). These are not obvious runtimes; the algorithms and data structures needed here were publishable results when they were first discovered.

Assuming we have these two structures, we can build a third data structure, which is what we'll use to get our final algorithm:

Tool Three: Longest Common Extension Queries. A longest common extension query data structure (or LCE query) is a data structure built around two strings S and T. It supports queries of the following form: given an index i into string S and an index j into string T, what is the length of the longest string that appears starting at index i in S and index j in T?

As an example, take these two strings:

S: OFFENSE
   0123456

T: NONSENSE
   01234567

If we did an LCE query starting at index 3 in string S and index 4 in string T, the answer would be the string ENSE. On the other hand, if we did an LCE query starting at index 4 in string S and index 0 in string T, we'd get back the string N.

(More strictly, the LCE query structure doesn't actually return the actual string you'd find at both places, but rather its length.)

It's possible to build an LCE data structure for a pair of strings S and T of length m in time O(m) so that each query takes time O(1). The technique for doing so involves building a generalized suffix tree for the two strings, then constructing an LCA data structure on top. The key insight is that the LCE starting at position i in string S and j in string T is given by the lowest common ancestor of suffix i of string S and suffix j of string T in the suffix tree.

The LCE structure is extremely useful for this problem. To see why, let's take your sample string abbcbbd. Now, consider both that string and its reverse, as shown here:

 S: abbcbbd
    0123456

 T: dbbcbba
    0123456

Every palindrome in a string takes one of two forms. First, it can be an odd-length palindrome. A palindrome like that has some central character c, plus some "radius" stretching out forwards and backwards from that center. For example, the string bbcbb is an odd-length palindrome with center c and radius bb.

We can count up how many odd-length palindromes there are in a string by using LCE queries. Specifically, build an LCE query structure over both the string and its reverse. Then, for each position within the original string, ask for the LCE of that position in the original string and its corresponding position in the mirrored string. This will give you the longest odd-length palindrome centered at that point. (More specifically, it'll give you the length of the radius plus one, since the character itself will always match at those two points). Once we know the longest odd-length palindrome centered at that position in the string, we can count the number of odd-length palindromes centered at that position in the string: that will be equal to all the ways we can take that longer palindrome and shorten it by cutting off the front and back character.

With this in mind, we can count all the odd-length palindromes in the string as follows:

for i from 0 to length(S) - 1:
    total += LCE(i, length(S) - 1 - i)

The other class of palindromes are even-length palindromes, which don't have a center and instead consist of two equal radii. We can also find these using LCE queries, except instead of looking at some position i and its corresponding position in the reversed string, we'll look at position i in the original string and the place that corresponds to index i - 1 in the reversed string. That can be done here:

for i from 1 to length(S) - 1:
    total += LCE(i, length(S) - i)

Overall, this solution

  • Constructs an LCE query structure from the original string and the reversed string. Reversing the string takes time O(m) and building the LCE query structure takes time O(m). Total time for this step: O(m).
  • Makes 2m - 1 total queries to the LCE structure. Each query takes time O(1). Total time for this step: O(m).

I'm fairly certain it's possible to achieve this runtime without using such heavyweight tools, but this at least shows that a linear-time solution exists.

Hope this helps!

Upvotes: 7

EDToaster
EDToaster

Reputation: 3180

Here are some things you can think about when optimizing this algorithm:

  1. What are palindromes? A palindrome is a symmetrical string, which means it must have a center pivot! The pivot may be one of the following:

    • A letter, as in "aba", or

    • The position between two letters, as in the position between the letters "aa"

That means there are a total of 2n − 1 possible pivots.

Then, you can search outwards from each pivot. Here is an example:

  • Sample string: "abcba"
  • First, let's take a pivot at "c":
    • abcba → c is a palindrome, then widen your search by 1 on each side.
    • abcba → bcb is a palindrome, then widen your search by 1 on each side.
    • abcba → abcba is a palindrome, so we know there are at least 3 palindromes in the string.
  • Continue this with all pivots.

Which this approach, the runtime complexity is O(n2).

Upvotes: 7

Related Questions