HukeLau_DABA
HukeLau_DABA

Reputation: 2528

No such thing as a perfect hash function?

according to http://java-bytes.blogspot.com/2009/10/hashcode-of-string-in-java.html: "First off, its a known fact that there is no perfect hashing algorithm, for which there are no collisions."

The author is talking practically and not theoretically right? Because theoretically, here is a perfect hash function: "for a given object, assign it a new number". There are an infinite amount of numbers, so we'll always have something to assign to an object that's unique. In practice this isn't feasible though because we have a limited amount of memory.

Upvotes: 0

Views: 1243

Answers (1)

templatetypedef
templatetypedef

Reputation: 373082

Typically, a hash function maps from one set of objects (the universe) to a smaller set of objects (the codomain). Commonly, the universe is an infinite set, such as the set of all strings or the set of all numbers, and the codomain is a finite set, such as the set of all 512-bit strings, or the set of all numbers between 0 and some number k, etc. In Java, the hashCode function on objects has a codomain of values that can be represented by an int, which is all 32-bit integers.

I believe that what the author is talking about when they say "there is no perfect hash function" is that there is no possible way to map the infinite set of all strings into the set of all 32-bit integers without having at least one collision. In fact, if you pick 232 + 1 different strings, you're guaranteed to have at least one collision.

Your argument - couldn't we just assign each object a different hash code? - makes the implicit assumption that the codomain of the hash function is infinite. For example, if you were to try this approach to build a hash function for strings, the codomain of the hash function would have to be at least as large as the set of all possible natural numbers, since there are infinitely many strings. Most programming languages don't support hash codes that work this way, though you're correct that in theory this would work. Of course, someone might object and say that this doesn't count as a valid hash function, since typically hash functions have finite codomains.

Hope this helps!

Upvotes: 11

Related Questions