roderick
roderick

Reputation: 89

Too long TBB's shedule time when using parallel_deterministic_reduce

Followed Scheduling Overhead in an Intel® oneAPI Threading Building Blocks Application. I write a program: test_tbb_perf_vtune_profiler

And test the program using Intel VTune Profiler: enter image description here

The call stack for arena_slot::get_task and TBB Schedule Internals:

enter image description here enter image description here

When I try to change to use tbb::task_arena ta(8);, there's no significant performance change:

double VectorReduction(double* v, size_t n) {
  tbb::task_arena ta(8);
  double sum = ta.execute([&]() {
    return tbb::parallel_deterministic_reduce(
      tbb::blocked_range<double*>(v, v + n), 0.0,
      [](const tbb::blocked_range<double*>& r, double value) -> double { return std::accumulate(r.begin(), r.end(), value); },
      std::plus<double>());
  });

  return sum;
}

My question is: does any way to reduce schedule time of Intel::TBB?

Upvotes: 0

Views: 42

Answers (0)

Related Questions