SchrodingersCat
SchrodingersCat

Reputation: 11

C++ : Writing a custom hash function for unordered_set that uses the number of buckets in the hash table

I am writing a custom hash function for class Coord (2-dimensional coordinates).

Is it possible to change the following hash function so that b is the current number of buckets in the unordered_set hash table, and changes if the number of buckets is altered?

namespace std
{
    template <>
    struct hash<Coord>
    {
        size_t operator()(const Coord &k) const
        {
            int b = 11;

            int a1 = static_cast<int> (pow(b,(1.0/3.0)));
            int a2 = static_cast<int> (pow(b,(2.0/3.0)));

            return ((a1*k.getX() + a2*k.getY()) % b);
        }
    };
}

Upvotes: 1

Views: 760

Answers (2)

vitaut
vitaut

Reputation: 55534

I don't think it is a good idea because if the hash table grows all your hashes will change affecting existing elements. Just return a1*k.getX() + a2*k.getY() and the hash table implementation will do the necessary modulo part for you.

That said you can get the number of buckets via std::unordered_map::bucket_count() and store it in your hash object (just for illustration, don't do it):

struct MyHash {
  std::size_t bucket_count;
  size_t operator()(const Coord &k) const {
    int a1 = static_cast<int> (pow(b,(1.0/3.0)));
    int a2 = static_cast<int> (pow(b,(2.0/3.0)));
    return ((a1*k.getX() + a2*k.getY()) % bucket_count);
  }
};

Upvotes: 0

Richard Hodges
Richard Hodges

Reputation: 69864

The only portable and efficient method is to compute hashes that are as evenly distributed through the range of std::size_t as possible. For a given key, it's important that the hash function returns the same hash code for the duration of the program.

As the unordered map grows, it will rehash itself. Since keys are immutable it would not be possible to communicate the new bucket count to the keys in order to compute new hashes (which in any case will be subject to modulo in the map).

Going further:

Seeking to communicate a bucket count to the key (via references or mutable data members, for example) will only end in tears, and would be an error.

One problem is that this would couple this key class to the map class - that's bad enough, but...

Worse still, the unordered map does not communicate with you to warn you that it's about to rehash. You'd have to discover that after inserting an item. This would mean that all items in the map now have hashes based on an old bucket count. Attempting to insert a duplicate into the map would most likely work, breaking the semantics of the map!

To make this work, after every insert you'd have to remove all items into a vector, recompute their hashes and then reinsert them.

Horrific!!!

Please tell me that I have convinced you not to walk this path of doom.

Upvotes: 1

Related Questions