Reputation: 1962
I'm currently covering hashing and hash tables and I'm wondering why something like the following is considered a bad hash function (pseudocode):
function hash(String_t word, Int table_size)
i = randomly generated number with 0<i<table_size
j = ASCII code of the first letter of word
return i * j % table_size
Assuming that the value of i
could be stored during function invocations to achieve consistency (for example using the static
keyword in C to store the value of i
within the function scope), why is this a bad hash function?
Upvotes: 0
Views: 5501
Reputation: 6629
A good hash function should work well for various input sizes, with only the condition that the table size is a constant times bigger than the number of inputs. This does not meet that criteria for a few reasons:
The hash value is determined only by the first letter. Hence, the total number of possible hash values is bounded by the number of possible first letters, which is small. Choosing a large table size for a large number of inputs has no effect: you will still get heaps of collisions.
Since first letters of words are very far from being uniformly distributed, you will have a lot of collisions. At least use all letters of the word when defining your function, but you really need more than that advice to rescue this construct.
Define d = gcd(i, table size). In some cases, d will be greater than 1, and in those cases, only one in every d elements of the table will have a chance to be populated: the others will be wasted space (and hence more collisions). That is, only 0, d, 2d, 3d, ... Could possibly be a hash value. At least restrict to values of i with d=1 to prevent these degenerate cases.
i multiplied by the largest value of j will occasionally be smaller than the table size (when i is small) implying that the top end of the table will never get touched. More wasted space.
People usually try to come up with hash functions that work well in general and that you can prove something good about them. Here you have something intended for a very specific case and what is most visible to me is the negative cases, so very very doubtful that you can prove anything positive about this construct.
Upvotes: 2