Reputation: 6751
I have a parallel code that can be reduced to basically:
#include <algorithm>
#include <vector>
struct TKeyObjPtr;
class TObj
{
public:
virtual void Calculate(TKeyObjPtr const &) = 0;
};
struct TKeyObjPtr
{
int Key;
TObj *ObjPtr;
};
void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec)
{
#pragma omp parallel for
for (auto It1= KeyObjPtrVec.begin(); It1!=KeyObjPtrVec.end(); ++It1)
for (auto It2= It1+1; It2!=KeyObjPtrVec.end() && It2->Key==It1->Key; ++It2)
It1->ObjPtr->Calculate(*It2);
}
I would like to modernize that code by using the c++17 parallel algorithms. Unfortunately, I'm having trouble in rewriting such a simple piece of code.
An option would be using boost::counting_iterator
:
void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec)
{
std::for_each(std::execution::par_unseq,
boost::counting_iterator<std::size_t>(0u),
boost::counting_iterator<std::size_t>(KeyObjPtrVec.size()),
[&KeyObjPtrVec](auto i)
{
for (auto j= i+1; j<KeyObjPtrVec.size() && KeyObjPtrVec[j].Key==KeyObjPtrVec[i].Key; ++j)
KeyObjPtrVec[i].ObjPtr->Calculate(KeyObjPtrVec[j]);
});
}
This works but is considerably more verbose and, worse, I don't think it is compliant with the
standard because boost::counting_iterator
is a stashing iterator and, therefore, does not
meet the Cpp17ForwardIterator requirements.
Is it possible to write the above code as concisely as with OpenMP, while satisfying the constraints of the standard on parallel algorithms?
Upvotes: 8
Views: 424
Reputation: 21778
If I get the algorithm right, we must
KeyObjPtrVec
vector,Key
field of the structure matches, we must call Calculate
on the Obj
field of our structure.C++17 has the excellent built-in parallelization support, and this can simply done with the standard libraries:
#include <algorithm>
#include <execution>
void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec) {
std::vector<size_t> indices(KeyObjPtrVec.size());
for (size_t k = 0; k < indices.size(); k++) {
indices[k] = k;
}
std::for_each(std::execution::par, indices.begin(), indices.end(),
[KeyObjPtrVec, indices](size_t k) {
std::for_each(std::execution::par, indices.begin() + k + 1, indices.end(),
[k, KeyObjPtrVec](size_t l) {
if (KeyObjPtrVec[k].Key == KeyObjPtrVec[l].Key) {
KeyObjPtrVec[k].ObjPtr->Calculate(KeyObjPtrVec[l]);
}
});
});
}
If you specify std::execution::par
as the first parameter of std::for_each
, the parallel version runs. No need to care about executor, thread synchronization or any cleanup - all just works out of box. Just link the library tbb
(target_link_libraries(myexecutable tbb)
for cmake).
With std::for_each you get access the element you are supposed to process, not the iterator. To get more free access required for setting up the second std::for_each
, we create array of indices and iterate over them instead. Having index of the element, we can access the array at any relative position in the two lambdas we have.
This code is tested and confirmed to work as described in the beginning of the answer.
Upvotes: 0