unglinh279
unglinh279

Reputation: 673

Longest substring that appears at least twice in O(n.logn)

Problem:

Given a String S of N characters (N <= 200 000), find the length of the longest substring that appears at least twice (the substrings can overlap).

My solution:

Here's what i've tried:

int main()
{
    std::string s;
    std::cin >> s;
    int max = 0;
    typedef std::string::const_iterator sit;
    sit end = s.end();
    for(sit it1 = s.begin(); it1 != end; ++it1)
        for(sit it2 = it1 + 1; it2 != end; ++it2)
            max = std::max(max, std::mismatch(it1, it1 + (end - it2), it2).first - it1);
    std::cout << max;
}

Question:

However, the above solution will get TLE as it runs in O(n^3). Is there any ways to improve it so it can run in O(n.logn)?

Upvotes: 7

Views: 1392

Answers (2)

Y.T.
Y.T.

Reputation: 2729

Suffix tree is an overkill for this problem. In fact, binary search suffices and the implementation is much easier.

Idea

The idea is simple: If there exists a duplicated substring of length N (N > 1), there must also exists one of length N - 1. Therefore, if we let f(x) to denote a function that returns true if there exists a duplicated substring of length x, f(x) will be a monotonic function, which allows a binary search solution.

Implementation

Binary search on the length of the longest duplicated substring and apply sliding windows to check if a given length is possible. Use string hashing to check for duplicate. Complexity: N log N

Upvotes: 3

prakash sellathurai
prakash sellathurai

Reputation: 107

find the length of the longest substring that appears at least twice (the substrings can overlap)

This problem is also commonly known as Longest repeated substring problem. It can be solved in linear time with a suffix tree.

To solve this problem:

  1. Add a special character '$' to the given string S),
  2. build a suffix tree from S ;
  3. the longest repeated substring of S is indicated by the deepest internal node in the suffix tree, where depth is measured by the number of characters traversed from the root.

Time complexity:

  • Suffix tree takes O(nlog(k))) time, where k is the size of the alphabet (if k is considered to be a constant, the asymptotic behaviour is linear)
  • tree traversal(for finding longest repeated substring) can be done in O(n) time

Upvotes: 4

Related Questions