Doot
Doot

Reputation: 745

Hashing an unordered container without needing to implement a comparison operator for the type

I'm looking to hash an unordered container, such as an unordered_map and unordered_set. For an ordered type, like a vector, boost::hash_range(v.begin(). v.end()) works well, but it is also dependent on order, e.g.

#include <boost/functional/hash.hpp>
#include <functional>
namespace std {
    template<>
    struct hash<std::vector<int>> {
        size_t operator ()(const std::vector<int>& v) const noexcept {
            return boost::hash_range(v.begin(), v.end());
        }
    };
}

Example of this working: https://coliru.stacked-crooked.com/a/0544c1b146ebeaa0

boost.org says

If you are calculating a hash value for data where the order of the data doesn't matter in comparisons (e.g. a set) you will have to ensure that the data is always supplied in the same order.

Okay, so that would seem easy - just sort the data in some way, but I don't want to do this every time I hash it. Using a normal map or set could work but I would need to do a fair bit of re-writing.

Additionally, this would require every type I use to have either >, <, <= or >= defined, as well as == and std::hash.

How can I hash a container so that the order does not matter?

Upvotes: 2

Views: 647

Answers (2)

otmar
otmar

Reputation: 416

After sorting the hash values of individual container elements, the sorted list of hash values can be hashed again to obtain a hash value for the unordered container.

Assume H1 is the hash function for a single element and H2 is the hash function for a list of hash values, then the hash value for some unordered container with elements A, B, and C could be calculated as H2(SORT(H1(A), H1(B), H1(C))). By construction, the resulting hash value will be independent of the order. In this way, you will also get a stronger hash value compared to combining the individual hash values using commutative operations.

Upvotes: 2

Jack
Jack

Reputation: 133577

The requirement seems rather logical, since the hash function is combining the previous elements hash with the current element hash somehow, then the order is important, because

H(A, B, C) is then computed as H(H(H(A), B), C) so that each intermediate result is used as input to the next element (think about a block cipher).

To hash a sequence of elements without caring for ordering you'd need a commutative hash function, so you'd be restricted to commutative operations (eg. XOR). I'm not sure how strong could be such an hash function but for your specific scenario it could be sufficient.

Upvotes: 2

Related Questions