Reputation: 81
I have this code:
for(i=0; i<size; i++)
{
d[i] = d[i-1] + v[i];
}
When I do parallel processing for this loop, I have data dependency and the initiation interval becomes 2 Meaning I have:
initiation interval:2
|load v[i-1]|load d[i-2]| add |store d[i-1]|
| | | load v[i]|load d[i-1] | add | store d[i] |
I do not want to stall in between.
initiation interval:1
|load v[i-1]|load d[i-2]| add |store d[i-1]|
| |load v[i] |load d[i-1]| add | store d[i] |
This is not possible since d[i-1] is not stored yet.
How do we make initiation interval to 1 by changing the code?
Upvotes: 0
Views: 166
Reputation: 336
You cannot reduce that gap.
Also this (loop unrolling) is not the most efficient way of parallel processing for this kind of loop. Your loop looks like a prefix-sum operation. There are fast parallel algorithms and implementations available for prefix-sum. For example, this question
Upvotes: 1