Reputation: 89
Followed Scheduling Overhead in an Intel® oneAPI Threading Building Blocks Application. I write a program: test_tbb_perf_vtune_profiler
And test the program using Intel VTune Profiler:
The call stack for arena_slot::get_task
and TBB Schedule Internals
:
When I try to change to use tbb::task_arena ta(8);
, there's no significant performance change:
double VectorReduction(double* v, size_t n) {
tbb::task_arena ta(8);
double sum = ta.execute([&]() {
return tbb::parallel_deterministic_reduce(
tbb::blocked_range<double*>(v, v + n), 0.0,
[](const tbb::blocked_range<double*>& r, double value) -> double { return std::accumulate(r.begin(), r.end(), value); },
std::plus<double>());
});
return sum;
}
My question is: does any way to reduce schedule time of Intel::TBB
?
Upvotes: 0
Views: 42