metalfox
metalfox

Reputation: 6751

Parallelizing a simple loop with c++17 algorithms

I have a parallel code that can be reduced to basically:

#include <algorithm>
#include <vector>

struct TKeyObjPtr;

class TObj
{
public:
  virtual void Calculate(TKeyObjPtr const &) = 0;
};

struct TKeyObjPtr
{
  int Key;
  TObj *ObjPtr;
};

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec)
{
  #pragma omp parallel for
  for (auto It1= KeyObjPtrVec.begin(); It1!=KeyObjPtrVec.end(); ++It1)
    for (auto It2= It1+1; It2!=KeyObjPtrVec.end() && It2->Key==It1->Key; ++It2)
      It1->ObjPtr->Calculate(*It2);
}

I would like to modernize that code by using the parallel algorithms. Unfortunately, I'm having trouble in rewriting such a simple piece of code.

An option would be using boost::counting_iterator:

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec)
{
  std::for_each(std::execution::par_unseq,
    boost::counting_iterator<std::size_t>(0u),
    boost::counting_iterator<std::size_t>(KeyObjPtrVec.size()),
    [&KeyObjPtrVec](auto i)
      {
        for (auto j= i+1; j<KeyObjPtrVec.size() && KeyObjPtrVec[j].Key==KeyObjPtrVec[i].Key; ++j)
          KeyObjPtrVec[i].ObjPtr->Calculate(KeyObjPtrVec[j]);
      });
}

This works but is considerably more verbose and, worse, I don't think it is compliant with the standard because boost::counting_iterator is a stashing iterator and, therefore, does not meet the Cpp17ForwardIterator requirements.

Is it possible to write the above code as concisely as with OpenMP, while satisfying the constraints of the standard on parallel algorithms?

Upvotes: 8

Views: 424

Answers (1)

Audrius Meškauskas
Audrius Meškauskas

Reputation: 21778

If I get the algorithm right, we must

  • iterate over all possible value pairs in the vector of structures, provided KeyObjPtrVec vector,
  • but without pairing the object with itself
  • and the order of the pair members is not important
  • Then, if they Key field of the structure matches, we must call Calculate on the Obj field of our structure.

C++17 has the excellent built-in parallelization support, and this can simply done with the standard libraries:

#include <algorithm>
#include <execution>

void Calculate(std::vector<TKeyObjPtr> const &KeyObjPtrVec) {
  std::vector<size_t> indices(KeyObjPtrVec.size());
  for (size_t k = 0; k < indices.size(); k++) {
    indices[k] = k;
  }

  std::for_each(std::execution::par, indices.begin(), indices.end(),
      [KeyObjPtrVec, indices](size_t k) {
        std::for_each(std::execution::par, indices.begin() + k + 1, indices.end(),
            [k, KeyObjPtrVec](size_t l) {
              if (KeyObjPtrVec[k].Key == KeyObjPtrVec[l].Key) {
                KeyObjPtrVec[k].ObjPtr->Calculate(KeyObjPtrVec[l]);
              }
            });
      });
}

If you specify std::execution::par as the first parameter of std::for_each, the parallel version runs. No need to care about executor, thread synchronization or any cleanup - all just works out of box. Just link the library tbb (target_link_libraries(myexecutable tbb) for cmake).

With std::for_each you get access the element you are supposed to process, not the iterator. To get more free access required for setting up the second std::for_each, we create array of indices and iterate over them instead. Having index of the element, we can access the array at any relative position in the two lambdas we have.

This code is tested and confirmed to work as described in the beginning of the answer.

Upvotes: 0

Related Questions