Reputation:

Code complexity of longest non-repeating substring algorithm

Problem:

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Solution:

int lengthOfLongestSubstring(char* s) {
    if (s == NULL) {
        return 0;
    }
    myHash  hash;
    int start = 0;
    int end = 0;
    int maxLen = 0;
    while (s[end] != '\0') {
        char c = s[end];
        if (hash.find(c) == hash.end()) {
            hash[c] = end;
            end++;
        }else {
            if (end - start > maxLen) {
                maxLen = end - start;
            }
            int index = hash[c];
            while(start <= index) {
                hash.erase(s[start]);
                start++;
            }
        }
    }
    if (end - start > maxLen) {
        maxLen = end - start;
    }
    return maxLen;
}

Complexity:

Some one said this algorithm complexity is O(n) but I think it didn't consider the second loop, it's less than O(n^2) but should be bigger than O(n).

How should we analyze the worst case complexity?

Upvotes: -1

Answers (3)

John Bollinger

Reputation: 181714

Asymptotic complexity analysis is trickier than just counting loops.

Before we go any further, however, do note that if an algorithm's asymptotic complexity is O(n), then it is also O(n²), as the latter expresses a weaker bound. Presumably you're looking for the tightest bound.

So let's now look at sum of all the operations performed in all the iterations of the outer loop. On each iteration, hash.find(c) == hash.end() is computed. We assume each such evaluation costs O(1). Then there are two alternatives:

the current character, s[end], is distinct from all others in the current substring, as judged by the hash lookup. In this case, the current character is added to the current substring, and end is incremented.
the current character duplicates a previous character in the current substring. In this case, the previous appearance of the current character and each preceding character in the current substring is removed from the substring.

Each of the n characters in the original string is added to the current string once, and removed from it at most once, and the actions involved in each individual character's addition or removal cost O(1). The remaining operations performed in the outer loop cost O(1) per iteration. Every iteration of the loop performs either one addition or at least one removal, so there cannot be more than 2*n iterations altogether. That gives the loop nest a worst-case cost of n * O(1) [for additions] + n * O(1) [for removals] + 2 * n * O(1) [for other operations], for a total cost bounded by 4 * n * O(1) = O(4n) = O(n).

Upvotes: 1

Sergey Kalinichenko

Reputation: 727047

The complexity of this algorithm is indeed O(n).

Proof: Consider the values of start and end. Each iteration of the outer loop performs one of two actions:

advances end by 1, or
advances start in a loop one or more times.

The number of times that end is advanced is equal to n, the length of the input. The total number of times that start is advanced cannot exceed n, because index is never set to a number above n, and start is never decremented.

Hence, the total number of times the outer loop runs is n, and the total number of times the inner loop runs is n, yielding time complexity of O(n).

Note that the inner loop does not do all n iterations at once. Its iterations are "distributed" among the iterations of the outer loop. However, if you total up the number of iterations of the inner loop, during all iterations of the outer loop, they are not going to exceed n.

Upvotes: 1

Marshall Conover

Reputation: 845

If I'm understanding your algorithm and data structure used correctly, consider the worse-case scenario: we get a string of the same character over and over again, and have to erase every time. In that case, we've performed just about 2n operations for the string. This is still linear growth, granted with a constant in front of it. So, for the purposes of understanding how it grows, we're still linear growth - but, as you noted, in between n and n^2, since we're about 2n worst-case.

Upvotes: 0

Code complexity of longest non-repeating substring algorithm

Problem:

Solution:

Complexity:

Answers (3)

Related Questions