A M
A M

Reputation: 15265

Problem with "equality" in std::unordered_set

I see some behaviour in std::unordered_set that I need to understand better.

I created a struct that also provides a "hash" function and an "equality" function. The struct contains values for an interval, the lower and upper bound. I use the struct in the std::unordered_set definition also as the "hash" and "equal_to" template argument. Many will not like it, but for the MRE it is OK. No need to write separate functors.

I want to add only disjoint intervals to a std::unordered_set. So, I use a selfmade disjoint-function and consider 2 intervals to be equal, if they overlap.

I have written some test function (MRE) below.

Why can I insert intervals [1,5] and [2,3] although they overlap and are considered to be equal?

Maybe becuase of the different hash values and the nature of the underlying hash table in the std::unordered_set? Please kindly confirm.

#include <iostream>
#include <fstream>
#include <vector>
#include <unordered_set>

struct Interval {
    int left{};
    int right{};
    bool isDisjoint(const Interval& other) const {
        return left < other.right && right < other.left;
    }
    // Hash function
    std::size_t operator()(const Interval& i) const {
        std::size_t result = static_cast<std::size_t>(i.left) + (static_cast<std::size_t>(i.right) << 16);
        return result;
    }
    // Comparison for equality
    bool operator ()(const Interval& lhs, const Interval& rhs) const  {
        bool result =  not lhs.isDisjoint(rhs);
        return  result;
    }
    //bool operator== (const Interval& other) const {
    //    bool result = not isDisjoint(other);
    //    return  result;
    //}

    // Simple output
    friend std::ostream& operator << (std::ostream& os, const Interval& i) {
        return os << '[' << i.left << ','<< i.right << ']';
    }
};

std::vector<Interval> intervals{ {1,5},{2,3},{1,10},{5,10},{6,8} };

int main() {

    std::unordered_set< Interval, Interval, Interval> disjointIntervals{};

    // Special test for first 2 intervals
    std::cout << ((not intervals[0].isDisjoint(intervals[1]))?"[1,5] is equal to [2,3]": "[1,5] is NOT equal to [2,3]") << '\n';

    for (const auto& i : intervals) {
        // Test for hash values. They are all different
        std::cout << "Interval: " << (static_cast<std::size_t>(i.left) + (static_cast<std::size_t>(i.right) << 16)) << " \tHash: " << (i.left + (i.right << 16)) << '\n';
        disjointIntervals.insert(i);
    }
    for (const auto& i : disjointIntervals)
        std::cout << i << '\n';
}

Upvotes: 0

Views: 801

Answers (2)

Lukas-T
Lukas-T

Reputation: 11370

std::unordered_set is a hash set. Elements will be organized into buckets, where the hash of an element identifies the bucket it's in. (That's how you get amortized O(1) insertion and lookup).

By the nature of hashing, different objects can produce the same hash. In this case and probably only in this case the comparison will be called to check wether two elements with equal hashes are indeed equal or not.

The two ranges [1,5] and [2,3] produce different hashes, thus will never be checked for equality.


It's not possible to produce identical hashes for overlapping ranges like you intend to. (unless the hash is always the same, which kind of defies the point of using a hash set).

I suggest to use different data structure. Maybe a sorted std::vector.

Upvotes: 3

463035818_is_not_an_ai
463035818_is_not_an_ai

Reputation: 123431

The named requirement Hash states:

All evaluations of h(k) executed within a given execution of a program yield the same result for the same value of k.

Also for std::hash, the same in different words:

For two parameters k1 and k2 that are equal, std::hash()(k1) == std::hash()(k2).

Roughly speaking, elements with the same hash end up in the same bucket, and because different elements may end up to have the same hash, only then they have to be checked for equality.

Your hash function does not produce the same hash for keys that are equal and thats wrong.

Upvotes: 1

Related Questions