justHelloWorld
justHelloWorld

Reputation: 6844

Not sure how unordered_map works

I've a little bit confusion how unordered_map works and what buckets are and how thet are managed.

From this blog post, unordered_map is vector of vectors.

My questions are:

Sorry for these questions, but I didn't find any detailed explanation how this structure works (on cppreference.com for example).

Upvotes: 7

Views: 4408

Answers (2)

Cubbi
Cubbi

Reputation: 47498

std::unordered_map is the standard C++ hash table. It used to be called hash_map in STL, but missed the boat when many of STL's interfaces were merged into C++ in 1998, and by 2011, so many libraries had their own hash_map, C++ had to pick another name (I think "unordered" was a great choice; assuming order in a hash table is a common source of bugs).

is it correct to assume that the buckets are the "internal" vectors?

no, it is both incorrect (incompatible with iterator invalidation requirements) and dangerous (under that assumption you may end up subtracting pointers to elements in the same bucket).

In real life, the buckets are linked lists; e.g.

is it correct to assume that we have to define the equal method on the key type (in addiction to the hash operator) in order to find the key inside the bucket?

Yes, locating the key in a bucket is exactly what the 4th template parameter of std::unordered_map is for (does not need to call the "equal method on the key type" literally, of course)

what is the external vector (hash table) size by default?

There is no "external vector". The number of buckets for a default-constructed std::unordered_map is implementation-defined, you can query it with bucket_count.

what is the internal vector size by default?

There is no "internal vector". The size of any given bucket equals the number of elements currently placed in the bucket. You can query it with bucket_size

what happens if the number of elements in one bucket becomes too big?bor in other words, when the rehash happens?

Nothing happens if the number of elements in one bucket becomes too big. But if average number of elements per bucket (aka load_factor) exceeds max_load_factor, rehash happens (e.g. on insert)

Upvotes: 7

Todd Christensen
Todd Christensen

Reputation: 1327

This may help you understand the buckets: http://www.cplusplus.com/reference/unordered_map/unordered_map/bucket_count/ http://www.cplusplus.com/reference/unordered_map/unordered_map/max_load_factor/

But generally, yes, the buckets are something like internal vectors. It needs an equality operator (or a predicate) to distinguish keys that had the same hash as you suggest.

The initial number of buckets is possibly 0. It can be set via rehash() or reserve() (they have slightly different semantics.)

http://www.cplusplus.com/reference/unordered_map/unordered_map/rehash/

In an ideal case, each bucket will only have the one item. You can check this by using bucket_size. When the load factor (total items vs. bucket count) gets high, it rehashes automatically.

By default, it'll aim for a 1:1 load factor. If the hash function is good, this may last until max_bucket_count items are inserted.

Keep in mind that the specific implementation of this may vary. Each implementation (e.g. from different platforms or standard libraries) really only needs to have the correct semantics.

If these answers are important to your program, you can query the values as I've described. If you're just trying to wrap your head around it, query them in some test scenarios and it may become more clear.

Upvotes: 2

Related Questions