aandis
aandis

Reputation: 4212

Finding the longest sub-string with no repetition in a string. Time Complexity?

I recently interviewed with a company for software engineering position. I was asked the question of longest unique sub-string in a string. My algorithms was as follows -

Start from the left-most character, and keep storing the character in a hash table with the key as the character and the value as the index_where_it_last_occurred. Add the character to the answer string as long as its not present in the hash table. If we encounter a stored character again, I stop and note down the length. I empty the hash table and then start again from the right index of the repeated character. The right index is retrieved from the (index_where_it_last_occurred) flag. If I ever reach the end of the string, I stop and return the longest length.

For example, say the string was, abcdecfg.

I start with a, store in hash table. I store b and so on till e. Their indexes are stored as well. When I encounter c again, I stop since it's already hashed and note down the length which is 5. I empty the hash table, and start again from the right index of the repeated character. The repeated character being c, I start again from the position 3 ie., the character d. I keep doing this while I don't reach the end of string.

I am interested in knowing what the time complexity of this algorithm will be. IMO, it'll be O(n^2).

This is the code.

import java.util.*;
public class longest
{
    static int longest_length = -1;
    public static void main(String[] args) 
    {
        Scanner in = new Scanner(System.in);
        String str = in.nextLine();
        calc(str,0);    
        System.out.println(longest_length);
    }

    public static void calc(String str,int index)
    {
        if(index >= str.length()) return;
        int temp_length = 0;
        LinkedHashMap<Character,Integer> map = new LinkedHashMap<Character,Integer>();
        for (int i = index; i<str.length();i++) 
        {
            if(!map.containsKey(str.charAt(i)))
            {
                map.put(str.charAt(i),i);
                ++temp_length;
            }
            else if(map.containsKey(str.charAt(i)))
            {
                if(longest_length < temp_length)
                {
                    longest_length = temp_length;
                }
                int last_index = map.get(str.charAt(i));
//              System.out.println(last_index); 
                calc(str,last_index+1);
                break;
            }
        }
        if(longest_length < temp_length)
            longest_length = temp_length;
    }
}

Upvotes: 0

Views: 116

Answers (1)

Paul Hankin
Paul Hankin

Reputation: 58211

If the alphabet is of size K, then when you restart counting you jump back at most K-1 places, so you read each character of the string at most K times. So the algorithm is O(nK).

The input string which contains n/K copies of the alphabet exhibits this worst-case behavior. For example if the alphabet is {a, b, c}, strings of the form "abcabcabc...abc" have the property that nearly every character is read 3 times by your algorithm.

You can solve the original problem in O(K+n) time, using O(K) storage space by using dynamic programming.

Let the string be s, and we'll keep a number M which will be the the length of maximum unique_char string ending at i, P, which stores where each character was previously seen, and best, the longest unique-char string found so far.

Start:

Set P[c] = -1 for each c in the alphabet.
M = 0
best = 0

Then, for each i:

M = min(M+1, i-P[s[i]])
best = max(best, M)
P[s[i]] = i

This is trivially O(K) in storage, and O(K+n) in running time.

Upvotes: 1

Related Questions