How good is hash function that is linear combination of values?

Question

I was reading text about hashing , I found out that naive hash code of char string can be implemented as polynomial hash function

h(S0,S1,S2,...SN-1) = S0*A^N-1 + S1*A^N-2 + S2*A^N-3 ..... SN-1*A^0. Where Si is character at index i and A is some integer.

But cannot we straightaway sum as

h(S0,S1,S2,...SN-1) = S0*(N)+S1*(N-1)+S2*(N-2) ...... SN-1*1.

I see this function also as good since two values 2*S0+S1 != 2*S1+S0 (which are reverse) are not hashed to same values. But nowhere i find this type of hash function

user555045 · Accepted Answer

Suppose we work with strings of 30 characters. That's not long, but it's not so short that problems with the hash should arise purely because the strings are too short.

The sum of the weights is 465 (1+2+...+30), with printable ASCII characters that makes the maximum hash 58590, attained by "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~". There are a lot more possible printable ASCII strings of 30 characters than that (95³⁰ ≈ 2E59), but they all hash into the range of 0 to 58590. Naturally you cannot actually have that many strings at the same time, but you could have a lot more than 58590, and that would guarantee collisions just based on counting (it is very likely to happen much sooner of course).

The maximum hash grows only slowly, you'd need strings of 34 million characters before the entire range of a 32bit integer is used.

The other way, multiplying by powers of A (this can be evaluated with Horner's scheme so no powers needs to be calculated explicitly, it still only costs an addition and a multiplication per character, though the naive way is not the fastest way to compute that hash), does not have this problem. The powers of A quickly get big (and start wrapping, which is fine as long as A is odd), so strings with 30 characters stand a good chance to cover the entire range of whatever integer type you're using.

How good is hash function that is linear combination of values?

Answers (2)

Related Questions