Reputation: 605
I am in the process of implementing a hash table and hence hash function in C and heard that Murmurhash was a suitably fast algorithm for this purpose. Looking up some C code for this provided:
uint32_t murmur3_32(const char *key, uint32_t len, uint32_t seed) {
static const uint32_t c1 = 0xcc9e2d51;
static const uint32_t c2 = 0x1b873593;
static const uint32_t r1 = 15;
static const uint32_t r2 = 13;
static const uint32_t m = 5;
static const uint32_t n = 0xe6546b64;
uint32_t hash = seed;
const int nblocks = len / 4;
const uint32_t *blocks = (const uint32_t *) key;
int i;
for (i = 0; i < nblocks; i++) {
uint32_t k = blocks[i];
k *= c1;
k = (k << r1) | (k >> (32 - r1));
k *= c2;
hash ^= k;
hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
}
const uint8_t *tail = (const uint8_t *) (key + nblocks * 4);
uint32_t k1 = 0;
switch (len & 3) {
case 3:
k1 ^= tail[2] << 16;
case 2:
k1 ^= tail[1] << 8;
case 1:
k1 ^= tail[0];
k1 *= c1;
k1 = (k1 << r1) | (k1 >> (32 - r1));
k1 *= c2;
hash ^= k1;
}
hash ^= len;
hash ^= (hash >> 16);
hash *= 0x85ebca6b;
hash ^= (hash >> 13);
hash *= 0xc2b2ae35;
hash ^= (hash >> 16);
return hash;
}
I was wondering if I could clarify a few things with regard to the arguments that are being passed here. "Key" is obviously the string that you are hashing. If this is defined in a struct as having an array length of 46, would this be the value that I would pass as "length" in the above function? The argument "seed", I take it this can be any arbitrary value as long it stays constant between hash calls? Are there any other parameters that I need to change keeping in mind that I am working on a 32-bit machine?
I take it I will also need to modulo the return hash by the size of my hash table?
In addition, if anyone could recommend a superior/faster alternative hash function used for strings then that would be much appreciated
Thanks in advance
Upvotes: 5
Views: 6122
Reputation:
About the question regarding the parameters: yes, just read the code, your assumptions are correct.
You don't need modulo as long as the size of your hash table is a power of 2. Then you can just use a bitmask, e.g. (pseudocode)
void* hashtbl[1<<8]; /* 256 */
int key = hash(value, ...) & ((1<<8) - 1); /* 0xff */
Then keep in mind that performance is not the only relevant characteristic of a hash function. It's very important to get an equal distribution of the whole key space. I can't tell you how "good" murmurhash is in that respect, but probably much better than a very simple hashing I used resently for playing around a bit:
static unsigned int
hash(const void *key, size_t keyLen, unsigned int hashmask)
{
size_t i;
unsigned int h = 5381;
for (i=0; i<keyLen; ++i)
{
h += (h << 5) + ((const unsigned char *)key)[i];
}
return h & hashmask;
}
although this simple function is probably faster. It's a tradeoff and a "clever" hashing algorithm tries to be as fast as possible while still giving good distribution. The simplistic function above doesn't really give good distribution, for example it will never use the whole key space for small input (less than 5 bytes).
Upvotes: 1