asndonsadoasndo231213
asndonsadoasndo231213

Reputation: 225

Time Complexity of "Find the longest substring with k-unique characters"?

I'm working with the algorithmic problem called "Find the longest substring with K unique characters". Take string “aabbccdd” as an example. if K is 1, the longest substring can be “aa”. If K is 2, the longest substring can be “aabb”. If K is 3, the longest substring can be “aabbcc”.

n this problem, you can take use of sliding window. Use a start pointer and an end pointer to represent a substring in a sliding window, if the current substring has K unique characters or less, it means the string can potentially be longer and you just need to move forward the end pointer by one character.

If after moving forward the end pointer the current substring exceeds K unique characters, it indicates that the previous substring is a potential candidate that has K unique characters. In this case, you just need to move forward the start pointer until the current substring has K or less unique characters.

I don't understand why the time complexity of this is linear. Off the surface it appears to be quadratic. We have two pointers. Pointer i could potentially be an linear scan. j remains fixed through the linear scan. So this appears quadratic to me.

TLDR: Can someone clearly deduce the linear time complexity for me? This isn't obvious to me. I like the notion of a sliding window to implictly iterate over the N^2 substrings. Then leveraging properties of the problem to set the start and end points to attempt to make this linear. But I don't see how this is linear. This looks quadratic in my analysis

Upvotes: 3

Views: 812

Answers (2)

Olivier Melançon
Olivier Melançon

Reputation: 22314

An algorithm being O(n) means that the number of operations is eventually bounded by some linear function a * n where a is a constant.

In the case of algorithms that rely on list traversal it does not mean that they must traverse the list only once. It means they must traverse the list at most a times and that a does not depend on the list length.

In your case, you have two pointers, but they each traverse the list only once. This means you always traverse the list only twice.

One more remark is that while the number of substrings in a n-string grows by O(n^2), you are not traversing all of them. Indeed only up to n of them are valid and these are the only ones you are inspecting with the two-pointers traversal.

By example in the string 'abcdef' with k = 3, you do not check the substring 'abcd'.

Upvotes: 2

Joe Iddon
Joe Iddon

Reputation: 20414

To deal with time complexity problems, it always helps to scale the problem up and think of a massive case.

If your string was thousands of characters long, we still only have one start pointer and one end pointer. The key thing is that both the pointers will only ever move forward (along the string). Therefore, the complexity of this is definitely O(n) since they are only moving forward together through the string - so the time this process would take is proportional to the length of the string (the time it takes to get to the end).


A time complexity of O(n^2) would occur if we were to check all the windows for the end pointer from the start pointer to the end of the string, and then increment the start pointer and check all those substrings now again (from the start pointer to the end of the string). Here, we see that as we iterate over the string with the start pointer (O(n)), each iteration involves its own, nested iteration of moving the end pointer to the end of the string. That would make the complexity O(n^2).

However, as previously stated, both your pointers only move forward, so at the most, this is of O(n) complexity.

Upvotes: 2

Related Questions