skr
skr

Reputation: 944

C++ Hash Table - How is collision for unordered_map with custom data type as keys resolved?

I have defined a class called Point which is to be used as a key inside an unordered_map. So, I have provided an operator== function inside the class and I have also provided a template specialization for std::hash. Based on my research, these are the two things I found necessary. The relevant code is as shown:

class Point
{
    int x_cord = {0};
    int y_cord = {0};
public:
    Point()
    {

    }
    Point(int x, int y):x_cord{x}, y_cord{y}
    {

    }
    int x() const
    {
        return x_cord;
    }
    int y() const
    {
        return y_cord;
    }
    bool operator==(const Point& pt) const
    {
        return (x_cord == pt.x() && y_cord == pt.y());
    }
};

namespace std
{
    template<>
    class hash<Point>
    {
    public:
        size_t operator()(const Point& pt) const
        {
            return (std::hash<int>{}(pt.x()) ^ std::hash<int>{}(pt.y()));
        }
    };
}

// Inside some function
std::unordered_map<Point, bool> visited;

The program compiled and gave the correct results in the cases that I tested. However, I am not convinced if this is enough when using a user-defined class as key. How does the unordered_map know how to resolve collision in this case? Do I need to add anything to resolve collision?

Upvotes: 2

Views: 3118

Answers (2)

Cris Luengo
Cris Luengo

Reputation: 60474

Knowing that Point is intended to store coordinates within an image, the best hash function here is:

pt.x() + pt.y() * width

where width is the width of the image.

Considering that x is a value in the range [0, width-1], the above hash function produces a unique number for any valid value of pt. No collisions are possible.

Note that this hash value corresponds to the linear index for the point pt if you store the image as a single memory block. That is, given y is also in a limited range ([0, height-1]), all hash values generated are within the range [0, width* height-1], and all integers in that range can be generated. Thus, consider replacing your hash table with a simple array (i.e. an image). An image is the best data structure to map a pixel location to a value.

Upvotes: 3

rici
rici

Reputation: 241701

That's a terrible hash function. But it is legal, so your implementation will work.

The rule (and really the only rule) for Hash and Equals is:

  • if a == b, then std::hash<value_type>(a) == std::hash<value_type>(b).

(It's also important that both Hash and Equals always produce the same value for the same arguments. I used to think that went without saying, but I've seen several SO questions where unordered_map produced unexpected results precisely because one or both of these functions depended on some external value.)

That would be satisfied by a hash function which always returned 42, in which case the map would get pretty slow as it filled up. But other than the speed issue, the code would work.

std::unordered_map uses a chained hash, not an open-addressed hash. All entries with the same hash value are placed in the same bucket, which is a linked list. So low-quality hashes do not distribute entries very well among the buckets.

It's clear that your hash gives {x, y} and {y, x} the same hash value. More seriously, any collection of points in a small rectangle will share the same small number of different hash values, because the high-order bits of the hash values will all be the same.

Upvotes: 6

Related Questions