Hello.World
Hello.World

Reputation: 740

Time Complexity of String Comparison

I ran some test to determine if O(==) for Strings is O(len(string)) or O(1).

My tests:

import timeit
x = 'ab' * 500000000
y = 'ab' * 500000000
%timeit x == y

> 163 ms ± 4.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

x = 'ab' * 5000
y = 'ab' * 5000
%timeit x == y

> 630 ns ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Looking at the above results I understand that string comparison is linear O(N) and not O(1).

However, I was reading this document: Complexities of Python Operations

The part:

Finally, when comparing two lists for equality, the complexity class above shows as O(N), but in reality we would need to multiply this complexity class by O==(...) where O==(...) is the complexity class for checking whether two values in the list are ==. If they are ints, O==(...) would be O(1); if they are strings, O==(...) in the worst case it would be O(len(string)). This issue applies any time an == check is done. We mostly will assume == checking on values in lists is O(1): e.g., checking ints and small/fixed-length strings.

This says the worst case for strings would be O(len(string)). My question is why worst case? Shouldn't the best/average case be O(len(string))?

Upvotes: 3

Views: 5165

Answers (1)

Netwave
Netwave

Reputation: 42786

The algorithm is simple, you check the strings char by char, so:

Hello == Hello => They are equal...so it is actually the worst case because you check all the chars from both strings
Hello != Hella => Still worst case, you realize they are different in the last char of the strings.
hello != Hello => Best case scenario, the first char for both (h != H) are different, so you stop checking them there.

Upvotes: 4

Related Questions