Ziqi Liu
Ziqi Liu

Reputation: 3171

std::unordered_map how to optimize bulk insert if input size is known

If I know I'm going to insert large amount of data (about one million entries) into std::unordered_map, is there anything I can do in advance to boost the performance? (just like std::vector::reserve can reserve enough memory space to avoid reallocate when I roughly know the size of data before bulk insert)

More specifically, the key in hashmap is a coordinate in 2D plane with customized hash function, as shown below

using CellIndex = std::pair<int32_t, int32_t>;

struct IdxHash {
  std::size_t operator()(const std::pair<int32_t, int32_t> &idx) const { return ((size_t)idx.second << 31) ^ idx.first; }
};

std::unordered_map<CellIndex, double, IdxHash> my_map;

// bluk insert into my_map
...

Upvotes: 2

Views: 1747

Answers (2)

jfMR
jfMR

Reputation: 24738

std::unordered_map is typically implemented as a chained hash table with linked lists. As such, inserting into an std::unordered_map takes constant time on average, and linear time in the size of the container in the worst case. This worst-case scenario for insertion corresponds to the case when the hash table elements must be rehashed because the current number of buckets in the table is insufficient to satisfy the load factor, and, therefore, a reallocation of the array of buckets is needed.

Keeping this in mind, if you know in advance the number of elements to insert into the std::unordered_map, you should consider std::unordered_map::reserve() to prevent rehashing from happening at insertion. This way, you will avoid both the bucket array reallocation and the rehashing from occurring.

std::unordered_map::insert() with hint

As with std::map, there are some overloads of the insert() member function that take a so called hint:

iterator insert(const_iterator hint, const value_type& value);

This hint iterator may be used to provide some additional information that can be used to speed the insertion up. However, the existence of these member functions in std::unordered_map taking a hint is only for interface compatibility reasons, to make its interface more suitable for generic programming. So, they don't improve the insertion time.

About the hash function

How perfect your hash function is shouldn't really matter when it comes to insertion time – only how fast it calculates the hash of a key. However, it becomes relevant when looking up elements in the hash table by their keys.

Upvotes: 1

srt1104
srt1104

Reputation: 959

reserve(x) prepares the unordered container for x number of elements. In comparison rehash(x) prepares the unordered container for x/max_load_factor() number of elements.

Also regarding your hash function, if you're aiming for it to return a unique value for a pair of coordinates, then it should return ((size_t)idx.second << 32) ^ idx.first. ((size_t)idx.second << 31) ^ idx.first will return the same value for (1, -1) and (0, 2^31-1).

Upvotes: 0

Related Questions