asdf
asdf

Reputation: 53

Why is the hash function O(1)

Why does finding the hash of a given string only run in constant time?

I am trying to write an optimized program to compare two strings by using string hashing.

From what I know, the hash of a string is usually defined by a polynomial rolling hash function. Online sources say calculating this hash and doing the comparison is O(1). For example, https://cp-algorithms.com/string/string-hashing.html says that

The idea behind the string hashing is the following: we map each string into an integer and compare those instead of the strings. Doing this allows us to reduce the execution time of the string comparison to O(1).

However, the actual implementation (according to https://cp-algorithms.com/string/string-hashing.html) of this hash function includes looping through the entire string:

long long compute_hash(string const& s) {
const int p = 31;
const int m = 1e9 + 9;
long long hash_value = 0;
long long p_pow = 1;
for (char c : s) {
    hash_value = (hash_value + (c - 'a' + 1) * p_pow) % m;
    p_pow = (p_pow * p) % m;
}
return hash_value;
}

Wouldn't this guarantee linear time complexity to compare two strings, instead of O(1)? Any help is appreciated!

Upvotes: 4

Views: 1076

Answers (2)

Goswin von Brederlow
Goswin von Brederlow

Reputation: 12332

The idea behind the string hashing is the following: we map each string into an integer and compare those instead of the strings. Doing this allows us to reduce the execution time of the string comparison to O(1).

This sentence is quite misleading.

As you figured out computing the hash of a string takes O(n) time. But once you have computed the hash you can compare strings by comparing their hashes. The paragraph only refers to the comparison of hashes, which is O(1).

But even then it's false to say the string comparison becomes O(1). The string comparison is only O(1) when the hash doesn't match. A matching hash does not guarantee the strings are the same and you have to double check by comparing the actual strings, which is O(n).

So what the hashing gives you is the ability to quickly detect that 2 strings are not equal in most cases. If you only want to compare 2 strings then it is utterly pointless. String hashing is only worth it if you compare many times. You need to compute the hash, store it and use it more than once to get any benefit.

PS: Some application pretend that hash collisions don't happen and that 2 strings are equal if their hashes match. When using a strong cryptographic hash this is a near certainty. But a collision is not impossible, they are playing russian roulette.

Upvotes: 2

Jeremy Friesner
Jeremy Friesner

Reputation: 73081

Big-O notation is a way of describing how the execution time of an algorithm will grow with the size of the data-set it is working on. In order for that definition to be applied, we have to specify what the data-set is that will be growing.

In most cases that's obvious, but sometimes it's a bit ambiguous, and this is one of those cases. Does "data set" in this case refer to an individual string? If so, then the algorithm is O(N), since the sequence of operations it has to perform grows linearly with the size of the string. But if by "data set" you mean the number of items in the associated hash table, then the algorithm can be considered O(1), since it will operate just as quickly (for any given string) when dealing with a million-entry Hashtable as it would for a two-entry Hashtable.

Regarding using hashing to compare two strings: once you have computed the hash values for the two strings, and stored those hash values somewhere, you can compare the two hash values in O(1) time (since comparing two integers is a fixed-cost operation, regardless of the sizes of the strings they were calculated from). It's important to note, however, that this comparison can only tell you if the two strings are different -- if the two hash values are equal, you still can't be 100% certain that the two strings are equal, since (due to the pigeonhole principle) two different strings can hash to the same value. So in that case you'd still need to compare the strings the regular O(N)/char-by-char way.

Upvotes: 15

Related Questions