Reputation: 157
I have
cilk_for (int i = 0; i < 100; i++)
x = fib(35);
the above takes 6.151 seconds
and
for (int i = 0; i < 100; i++)
x = cilk_spawn fib(35);
takes 5.703 seconds
The fib(x)
is the horrible recursive Fibonacci number function. If I dial down the fib function cilk_for
does better than cilk_spawn
, but it seems to me that regardless of the time it takes to do fib(x) cilk_for
should do better than cilk_spawn
.
What don't I understand?
Upvotes: 1
Views: 927
Reputation: 4049
Per comments, the issue was a missing cilk_sync. I'll expand on that to point out exactly how the ratio of time can be predicted with surprising accuracy.
On a system with P hardware threads (typically 8 on a i7) for/cilk_spawn code will execute as follows:
Thus the threads will execute the loop hand-over-hand, and the loop exits at a point where P-1 threads are still working on iterations. So the loop can be expected to finish after evaluating only (100-P-1) iterations.
So for 8 hardware threads, the for/cilk_spawn with missing cilk_sync should take about 93/100 of the time for the cilk_for, quite close to the observed ratio of about 5.703/6.151 = 0.927.
In contrast, in a "child steal" system such as TBB or PPL task_group, the loop will race to completion, generating 100 tasks, and then keep going until a call to task_group::wait. In that case, forgetting the synchronization would have led to a much more dramatic ratio of times.
Upvotes: 2