Reputation: 1840
Preferably a quick way. The N = 2^b
case is pretty easy. For it, first I'd figure out how many bits are in my chosen type:
typedef unsigned int type;
size_t size = sizeof(type) * 8;
Then I would perform a right shift by the proper amount of bits to produce a hash key of the upper b
bits.
type input = 0x657;
unsigned char b = 4;
unsigned char hash = input >> (size - b);
But what if I wanted N = 3
? Or any other non power of 2? Assuming my N
will always fit inside an unsigned char
(so it will be 256 at most), what would be the quickest way to hash some input
? While keeping the buckets no more than +/- 1 difference in range, and also preserving the order of the upper bits of input
, like the above function does.
Upvotes: 3
Views: 223
Reputation: 241691
For 32-bit values, do a 64-bit multiply with N
and keep the top 32 bits. (Similarly for other sizes, although if you have 64-bit values the multiplication gets trickier.)
Here's a basic proof outline.
It's obvious that this mapping is order-preserving; the only question is how many values fall into each bucket. Now, consider some bucket j
and find the smallest i
which maps to that bucket. For i
to fall into bucket j
means that Ni − j×232 = m
where 0 ≤ m < 232
, but if i
is the smallest such value, 0 ≤ m < N
. (Otherwise, i−1
would also fall into bucket j
.)
Now, define w = ⌊232∕N⌋
, which is equivalent to saying that Nw − 232 = −m' where 0 ≤ m' < N
. By adding those two formulas up, we discover that Ni + Nw - j×232 - 232 = m−m'
; simplifying, we get N(i+w)-(j+1)×232 = m−m'
and −N < m−m' < N
. That means that either i + w
or i + w + 1
is the smallest value which maps to j + 1
(depending on whether m − m'
is negative or not), which means that there are either w
or w + 1
values which map to j
. Since that's true for any j
, we can definitely say that there are only two bucket sizes, one of which is ⌊232∕N⌋
. It's not far from there to the assertion I made in my comment (now deleted) which is that the other possible bucket size is ⌈232∕N⌉
.
There's nothing magic about 232
in the above proof; I could have used any value M
. But that would have made the condensed proof even hard to read.
Upvotes: 3