Reputation: 536
It came out as a surprise when my parallelized version of fibonacci implementation(inefficient, just to compare the performance of the library) was much much slower than normal inefficient implementation, even after using all the 8 logical cores of my i7-6700HQ processor. Processor fans started going haywire the processing time was very slow compared to non-parallel implementation.
The example is directly from the TBB tutorial on intel - https://www.threadingbuildingblocks.org/tutorial-intel-tbb-task-based-programming
Here is my code
#include <tbb/task_group.h>
#include <chrono>
#include <iostream>
#define FIB_NUM 40
long fib1(int n)
{
if(n < 2) return n;
else
{
int x, y;
tbb::task_group g;
g.run([&]{x=fib1(n - 1);});
g.run([&]{y=fib1(n - 2);});
g.wait();
return x + y;
}
}
long fib2(int n)
{
return n < 2? n : fib2(n - 1) + fib2(n - 2);
}
int main()
{
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << fib2(FIB_NUM) << std::endl;
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << (t2 - t1).count() << std::endl;
t1 = std::chrono::high_resolution_clock::now();
std::cout << fib1(FIB_NUM) << std::endl;
t2 = std::chrono::high_resolution_clock::now();
std::cout << (t2 - t1).count() << std::endl;
}
I dont know what I am doing wrong. It would be helpful if someone could point it out.
Thanks
Upvotes: 3
Views: 797
Reputation: 632
The main problem of the example is small tasks. The leaf tasks (n<2
) just calculate return n
. Undoubtedly, it is inefficient for parallelism. The example can be improved with "cutoff" condition when the sub-problem is considered to be too small for parallelization. Let's suppose that there is no sense to calculate the first 12 fibonacci numbers in parallel and we will use a serial implementation instead:
long fib1(int n)
{
// Use a serial implementation for "small" numbers.
if(n < 12) return fib2(n);
else
{
int x, y;
tbb::task_group g;
g.run([&]{x=fib1(n - 1);});
g.run([&]{y=fib1(n - 2);});
g.wait();
return x + y;
}
}
Perhaps, you want to read articles about Divide and Conquer and The Task Scheduler.
P.S. Intel TBB uses task based approach for parallelism. The method tbb::task_group::run
creates a task (not a thread) that is executed when one of threads from the thread pool is available. So it does not matter how many tasks in the system - the number of threads is always limited.
Upvotes: 7