Reputation: 331
Hi can anybody suggest a hash function that would take a list of integers and returns a new integer
?
It should be fast to evaluate and more or less collision resistant.
I plan to use it in an approximate search algorithm (e.g. LSH)
Java's hashCode()
for a list uses this formula:
31 + SUM 31^(i+1) *a[i]
does anybody know why it is collision resistant? I guess it is about 31 being a prime but no idea how to prove it.
Upvotes: 0
Views: 1051
Reputation: 74750
You got your formula wrong (it counts backwards), it is actually:
SUM 31^(n-1-i) * a[i]
where n
is the length of the list, and we also use a[-1] = 1. Or, if you want to have it separately,
31^n + SUM 31^(n-1-i) * a[i]
(And the result taken modulo 2^32, as usual for Java's ints.)
Java's hashCode()
for List (specified in java.util.List, and supposed to be implemented by every implementation of this class) is not collision resistant in the crypto sense. That is, it is not hard to find a collision.
Given any list of integers with more than one element, we can increment one of them by 1 and decrement the next one by 31 (or the other way around), and have a second list with the same hash code.
For example, the two lists [1, 0]
and [0, 31]
have the same hash code 992 = 31·32 = (1·31 + 1)·31 + 0 = (1·31 + 0)·31 + 31
.
It has some weak resistance against accidental collisions, which is indeed related to the fact that 31 is a prime (i.e. has no real divisors), and "naturally occurring" lists of integers (or hashcodes of other objects) do not tend to differ by just this amount.
Of course, if we build lists of lists, each of which uses the same strategy for hash-codes, we get collisions easily: [ [0, 1], [0, 0] ]
and [ [0, 0], [1, 0] ]
have the same hash code 31³+2·31²+31 = 31744, too.
Upvotes: 1