Reputation: 3171
If I know I'm going to insert large amount of data (about one million entries) into std::unordered_map
, is there anything I can do in advance to boost the performance? (just like std::vector::reserve
can reserve enough memory space to avoid reallocate when I roughly know the size of data before bulk insert)
More specifically, the key in hashmap is a coordinate in 2D plane with customized hash function, as shown below
using CellIndex = std::pair<int32_t, int32_t>;
struct IdxHash {
std::size_t operator()(const std::pair<int32_t, int32_t> &idx) const { return ((size_t)idx.second << 31) ^ idx.first; }
};
std::unordered_map<CellIndex, double, IdxHash> my_map;
// bluk insert into my_map
...
Upvotes: 2
Views: 1747
Reputation: 24738
std::unordered_map
is typically implemented as a chained hash table with linked lists. As such, inserting into an std::unordered_map
takes constant time on average, and linear time in the size of the container in the worst case. This worst-case scenario for insertion corresponds to the case when the hash table elements must be rehashed because the current number of buckets in the table is insufficient to satisfy the load factor, and, therefore, a reallocation of the array of buckets is needed.
Keeping this in mind, if you know in advance the number of elements to insert into the std::unordered_map
, you should consider std::unordered_map::reserve()
to prevent rehashing from happening at insertion. This way, you will avoid both the bucket array reallocation and the rehashing from occurring.
std::unordered_map::insert()
with hintAs with std::map
, there are some overloads of the insert()
member function that take a so called hint:
iterator insert(const_iterator hint, const value_type& value);
This hint iterator may be used to provide some additional information that can be used to speed the insertion up. However, the existence of these member functions in std::unordered_map
taking a hint is only for interface compatibility reasons, to make its interface more suitable for generic programming. So, they don't improve the insertion time.
How perfect your hash function is shouldn't really matter when it comes to insertion time – only how fast it calculates the hash of a key. However, it becomes relevant when looking up elements in the hash table by their keys.
Upvotes: 1
Reputation: 959
reserve(x)
prepares the unordered container for x
number of elements. In comparison rehash(x)
prepares the unordered container for x/max_load_factor()
number of elements.
Also regarding your hash function, if you're aiming for it to return a unique value for a pair of coordinates, then it should return ((size_t)idx.second << 32) ^ idx.first
. ((size_t)idx.second << 31) ^ idx.first
will return the same value for (1, -1)
and (0, 2^31-1)
.
Upvotes: 0