SuperString
SuperString

Reputation: 22519

hashing a dictionary in C++

hi I want to use a hashmap for words in the dictionary and the indices of the words in the dicionary.

What would be the fastest hash algorithm for this?

Thanks!

Upvotes: 0

Views: 1513

Answers (6)

Rob deFriesse
Rob deFriesse

Reputation: 126

What is your use case? A radix search tree (trie) might be more suitable than a hash if you're mapping from string to integer. Tries have the advantage of reducing key comparisons for variable length keys. (e.g., strings)

Even a binary search tree (e.g., STL's map) might be superior to a hash based container in terms of memory use and number of key comparisons. A hash is more efficient only if you have very few collisions.

Upvotes: 0

Macke
Macke

Reputation: 25680

boost has a hash function that you can reuse for your own data (predefined for common types). That'd probably work well & fast enough if your needs aren't special.

Upvotes: 0

czuger
czuger

Reputation: 914

Have you tried just using the STL hash_map and seeing if it serves your needs before rolling anything more complex?

http://www.sgi.com/tech/stl/hash_map.html

Upvotes: 1

SingleNegationElimination
SingleNegationElimination

Reputation: 156158

The fastest hash function will be

template <class T>
size_t hash(T key) {
    return 0;
}

however, though the hashing will be mighty fast, you will suffer performance elsewhere. What you want is to try several hashing algorithms on actual data and see which one actually gives you the best performance in aggregate on the actual data you expect to use if the hashing or lookup is even a performance bottleneck. Until then, go with something handy. MD5 is pretty widely available.

Upvotes: 1

Leandro T. C. Melo
Leandro T. C. Melo

Reputation: 4022

At the bottom of this page there is a section A Note on Hash Functions with some information which you might find useful.

For convenience, I'll just replicate some links here:

Upvotes: 3

Larry Watanabe
Larry Watanabe

Reputation: 10184

There are many different hashing algorithms, of varying efficiency, but the most important issue is that it scatter the items fairly uniformly across the different hash buckets.

However, you may as well assume that the Microsoft engineers/library engineers have done a decent job of writing an efficient and effective hash algorithm, and just using the built-in libraries/classes.

Upvotes: 1

Related Questions