std::unordered_map how to optimize bulk insert if input size is known

Question

If I know I'm going to insert large amount of data (about one million entries) into std::unordered_map, is there anything I can do in advance to boost the performance? (just like std::vector::reserve can reserve enough memory space to avoid reallocate when I roughly know the size of data before bulk insert)

More specifically, the key in hashmap is a coordinate in 2D plane with customized hash function, as shown below

using CellIndex = std::pair;

struct IdxHash {
  std::size_t operator()(const std::pair &idx) const { return ((size_t)idx.second << 31) ^ idx.first; }
};

std::unordered_map my_map;

// bluk insert into my_map
...

jfMR · Accepted Answer

std::unordered_map is typically implemented as a chained hash table with linked lists. As such, inserting into an std::unordered_map takes constant time on average, and linear time in the size of the container in the worst case. This worst-case scenario for insertion corresponds to the case when the hash table elements must be rehashed because the current number of buckets in the table is insufficient to satisfy the load factor, and, therefore, a reallocation of the array of buckets is needed.

Keeping this in mind, if you know in advance the number of elements to insert into the std::unordered_map, you should consider std::unordered_map::reserve() to prevent rehashing from happening at insertion. This way, you will avoid both the bucket array reallocation and the rehashing from occurring.

`std::unordered_map::insert()` with hint

As with std::map, there are some overloads of the insert() member function that take a so called hint:

iterator insert(const_iterator hint, const value_type& value);

This hint iterator may be used to provide some additional information that can be used to speed the insertion up. However, the existence of these member functions in std::unordered_map taking a hint is only for interface compatibility reasons, to make its interface more suitable for generic programming. So, they don't improve the insertion time.

About the hash function

How perfect your hash function is shouldn't really matter when it comes to insertion time – only how fast it calculates the hash of a key. However, it becomes relevant when looking up elements in the hash table by their keys.

std::unordered_map how to optimize bulk insert if input size is known

Answers (2)

`std::unordered_map::insert()` with hint

About the hash function

Related Questions

std::unordered_map how to optimize bulk insert if input size is known

Answers (2)

std::unordered_map::insert() with hint

About the hash function

Related Questions

`std::unordered_map::insert()` with hint