Reputation: 3887
I need to fill a std::unordered_map<int,T>
with about 100 entries. Those are expensive to construct and I would like to use OpenMP to do that concurrently:
unordered_map<int, T> mapWithTs;
#pragma omp parallel for schedule(dynamic) // dynamic because T constructs in some unpredictable time.
for(int i=0; i<100; ++i)
{
mapWithTs.emplace(i, {i}) // calls the constructor T(i)
}
I read that the map will rehash and then iterators will no longer be valid. What must I do to make this work?
Further, what would the concurrency solution with the standard library look like?
Upvotes: 3
Views: 1170
Reputation: 5304
As noted by Galik and yeoman, it is essential to make moving of your objects a cheap operation. If it already is (construction is heavy, but moving is cheap), then you are fine. Otherwise you should put your objects in-to a uniq_ptr
's. After this rehash will be a cheap operation too (yes, rehash takes linear time, but it is 0(1) amortized complexity). So you don't need to worry about rehash.
The next thing is filling the map. You are working with it from several threads so you have to ensure that no more than one thread works with it simultaneously. You need something like #pragma omp critical
or std::mutex. And here comes important part: if you use emplace
like you do it now, then the heavy T constructor will be executed under critical section which kills whole idea of parallelization. So in this particular case you would prefer creating object T beforehand, then entering critical section and moving the object in-to a hashmap.
If construction of the T is really a heavy operation (it takes much longer, then inserting a value in-to a unordered_map), then that will be all. You won't get performance gain by making per thread lists and inserting them in-to the map. Otherwise, yeoman's answer can give you additional benefits by the cost of increasing complexity of your code.
Upvotes: 0
Reputation: 1681
In case these expensive to construct instances are help by reference, i.e. by shared_ptr, raw pointer, &c., I suggest letting each thread create its own, stack-local map, in a step canonically also called "map", and then combine them all in a single thread in a step canonically called "reduce".
This is called the "map-reduce" algorithm.
"map" is the usual name of a function that applies a function to all elements of a collection
"reduce" is the usual name of a function that combines all elements in a collection to a single value by calling a function with the current intermediate result and each element
hence the name :)
Upvotes: 2