ppetraki
ppetraki

Reputation: 428

how to specialize std::hash call operator for return types other than size_t?

This is the first time I've tried using std::unordered_map, c++17. I'm trying to build a quick LRU where I'm mapping sha1 digests to blocks of data. My sha1 object is fully comparable etc, but when I try to instantiate the map, I get this error:

/usr/include/c++/7/bits/hashtable_policy.h:87:34: error: no match for call to ‘(const std::hash<
kbs::crypto::basic_digest<20,kbs::crypto::openssl_sha1_digest_engine_policy> >) (const kbs::crypto::basic_digest<20, kbs::crypto::openssl_sha1_digest_engine_policy>&)’
  noexcept(declval<const _Hash&>()(declval<const _Key&>()))>

So it looks like I can specialize std::hash for my user defined type. But, it always returns size_t, which is bad, going from 20 bytes to 8 bytes kinda defeats the purpose of using sha1 for a hash_key.

Is there a way to work around this? Alternative containers? It's a waste to have to write my own hashmap. I guess I could use std:set...

Upvotes: 0

Views: 746

Answers (1)

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275896

Unordered map does not assume the hashes (the size_ts) are unique. It assumes keys are. And it performs well if the hashes are good.

Unordered map uses the size_t to determine which bucket to put the data into. It handles collisions in the size_t space fine.

Map your sha hash to a size_t however you want, and use the sha hash as your key. In the unlikely event you get a size_t hashing collision (50/50 when you have roughly 4 billion elements in the unordered map, assuming good hashing - see "birthday paradox " for the math, or more often with smaller hash tables; it dynamically grows the table) it will fall back on equality of your sha hash keys to avoid "real" collisions.

There are multiple kinds of collisions.

  • Sha-hash collisions: bad, means same key different data.
  • size_t hash collisions: meh, means the two elements will always be in same bucket.
  • internal hash table collisions: common, means that at this specific size, the two elements are in the same bucket.

Unless most of your data maos to the same size_t, that map being lossy is perfectly fine. You just worry about the first kind of collision, and provide == on your sha-hashes.

Upvotes: 4

Related Questions