Too long TBB's shedule time when using parallel_deterministic_reduce

Question

Followed Scheduling Overhead in an Intel® oneAPI Threading Building Blocks Application. I write a program: test_tbb_perf_vtune_profiler

And test the program using Intel VTune Profiler:

The call stack for arena_slot::get_task and TBB Schedule Internals:

When I try to change to use tbb::task_arena ta(8);, there's no significant performance change:

double VectorReduction(double* v, size_t n) {
  tbb::task_arena ta(8);
  double sum = ta.execute([&]() {
    return tbb::parallel_deterministic_reduce(
      tbb::blocked_range(v, v + n), 0.0,
      [](const tbb::blocked_range& r, double value) -> double { return std::accumulate(r.begin(), r.end(), value); },
      std::plus());
  });

  return sum;
}

My question is: does any way to reduce schedule time of Intel::TBB?

Too long TBB's shedule time when using parallel_deterministic_reduce

Answers (0)

Related Questions

Too long TBB&#39;s shedule time when using parallel_deterministic_reduce

Answers (0)

Related Questions

Too long TBB's shedule time when using parallel_deterministic_reduce