Forwarding in multi-threaded code

Question

I am working on abstractions for a family of optimization algorithms. These algorithms can run serially or multi-threaded, either using locking mechanisms or atomic operations.

I have a question regarding perfect-forwarding when it comes to the multi-threaded version of the algorithms. Say, for instance, I have some functor which I am not willing to copy because it is expensive. I can make sure the functors are static, in that, the calls to their operator()(...) will not change the state of the object. One such dummy functor is below:

#include 
#include 
#include 
#include 
#include 

template  struct WeightedNorm {
  WeightedNorm() = default;
  WeightedNorm(std::vector w) : w{std::move(w)} {}

  template  value_t operator()(Container &&c) const & {
    std::cout << "lvalue version with w: " << w[0] << ',' << w[1] << '
';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

  template  value_t operator()(Container &&c) const && {
    std::cout << "rvalue version with w: " << w[0] << ',' << w[1] << '
';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

private:
  std::vector w;
};

This functor might also have the reference qualifiers for some of its member functions, as seen above (although, above, they are not different from each other). Moreover, the function objects are allowed to modify their input c. To perfect-forward this functor properly to the worker threads in the algorithm, I have thought of the following:

template  struct algorithm {
  algorithm() = default;
  algorithm(const unsigned int nthreads) : nthreads{nthreads} {}

  template  void initialize(InputIt begin, InputIt end) {
    x = std::vector(begin, end);
  }

  template  void solve_ref_1(Func &&f) {
    std::vector workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(&algorithm::kernel, this,
                           std::ref(f), x);
    for (auto &worker : workers)
      worker.join();
  }

  template  void solve_ref_2(Func &&f) {
    auto &xlocal = x;
    std::vector workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread([&, xlocal]() mutable { kernel(f, xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

  template  void solve_forward_1(Func &&f) {
    std::vector workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          &algorithm::kernel(f)), decltype(x)>,
          this, std::ref(f), x); /* this is compilation error */
    for (auto &worker : workers)
      worker.join();
  }

  template  void solve_forward_2(Func &&f) {
    auto &xlocal = x;
    std::vector workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          [&, xlocal]() mutable { kernel(std::forward(f), xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

private:
  template  void kernel(Func &&f, Container &&c) {
    std::forward(f)(std::forward(c));
  }

  std::vector x;
  unsigned int nthreads{std::thread::hardware_concurrency()};
};

Basically, what I had in mind when writing the above was that algorithm::solve_ref_1 and algorithm::solve_ref_2 differ from each other only in the use of the lambda function. In the end, both of them call kernel with an lvalue reference to f and an lvalue reference to x, where x is copied in each of the threads either due to how std::thread works or the capture of xlocal by copy in the lambda. Is this correct? Should I be careful in prefering one to the other?

So far, I was not able to do what I wanted to achieve. I have not made an unnecessary copy of f, but I have not respected its reference qualifier, either. Then, I thought of forwarding f to kernel. Above, I couldn't find a way of making algorithm::solve_forward_1 compile due to the deleted constructor of std::ref for rvalue references. However, algorithm::solve_forward_2, which uses the lambda function approach, seems to be working. By "seems to be working," I mean that the following main program

int main(int argc, char *argv[]) {
  std::vector x{1, 2};
  algorithm alg(2);
  alg.initialize(std::begin(x), std::end(x));

  alg.solve_ref_1(WeightedNorm{{1, 2}});
  alg.solve_ref_2(WeightedNorm{{1, 2}});
  // alg.solve_forward_1(WeightedNorm{{1, 2}});
  alg.solve_forward_2(WeightedNorm{{1, 2}});

  return 0;
}

compiles and prints the following:

./main.out
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
rvalue version with w: 1,2
rvalue version with w: 1,2

In short, I have two major questions:

Is there any reason why I should prefer lambda function version to the other (or, vice versa), and,
Is perfect-forwarding the functor f more than once in my situation allowed/OK?

I am asking 2. above, because in the answer to a different question, the author says:

You cannot forward something more than once, though, because that makes no sense. Forwarding means that you're potentially moving the argument all the way through to the final caller, and once it's moved it's gone, so you cannot then use it again.

I assume that, in my case, I am not moving anything, but rather trying to respect the reference qualifier. In the output of my main program, I can see that w has the proper values in the rvalue version, i.e., 1,2, but that does not mean that I am doing some undefined behavior such as trying to access an already moved vector's values.

I would appreciate if you helped me understand this better. I am also open to any other feedback about the way I am trying to solve my problem.

StenSoft · Accepted Answer

There is no reason to prefer either
Forwarding inside a for cycle is not ok. You can't forward the same variable twice:

template void func(T && param) { func1(std::forward(param)); func2(std::forward(param)); // UB }

Chain forwarding (std::forward(std::forward(…))) on the other hand is fine.

Forwarding in multi-threaded code

Answers (1)

Related Questions