AMM
AMM

Reputation: 17920

Hashing a user defined type for use in unordered map

Say i have a user defined type

struct Key
{
short a;
int b,c,d 
}

And I would like to use this as a key in an unordered map. Whats a good (efficient) hashing technique. Given that I might need to do a lot of reads. Is there something using hash_combine or hash_append that I should be doing?

Upvotes: 0

Views: 148

Answers (2)

mefyl
mefyl

Reputation: 263

The safest path is probably to reuse standard hashing for your atomic types and combine them as you suggested. AFAIK there are no hash combination routines in the standard, but Boost does provide one:

#include <boost/functional/hash.hpp>
#include <functional>

namespace std
{
  template<>
  struct hash<Key>
  {
  public:
    std::size_t
    operator()(Key const& k) const
    {
      size_t hash = 0;
      boost::hash_combine(hash, std::hash<short>()(k.a));
      boost::hash_combine(hash, std::hash<int>()(k.b));
      boost::hash_combine(hash, std::hash<int>()(k.c));
      boost::hash_combine(hash, std::hash<int>()(k.d));
      return hash;
    }
  };
}

If depending on Boost is not an option, their hash combination routine is small enough to be reasonably and shamelessly stolen:

template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}

If your four integral value are purely random (e.g. they can take any value in range with an equal probability), this is probably very close to being optimal. If your values are more specific - one has only three possible values for instance or they are correlated - you could do slightly better. However, this will perform "well" in any circumstance.

Anyway, I don't think you should be too worried unless you're doing something extremely specific, or at least until actual performance issues arise. It's still time to change the hashing algorithm then with no other impact.

Upvotes: 3

Daniel
Daniel

Reputation: 1051

The main issue is that you need to reduce the amount of equal hash values for different keys as much as possible. So depending on the actual values you can use different approaches (starting with simple xor up to using a CRC).

So critical factors are: - range of values - typical values of values - number of elements in the map

If you use a "simple" approach: Be sure to actually check the content of your map to ensure that the items are equally distributed over all different buckets.

If you use a "complex" approach: Be sure to check it doesn't have a too big performance impact (usually not a problem. But if it is, you may want to "cache" the hash value...)

Upvotes: 0

Related Questions