Megan Darcy
Megan Darcy

Reputation: 582

how to optimize this code with unrolling factor 3?

void randomImprovedfunction(double a[], double p[], long n)
2 {
3      long i;
4      double last_v, v;
5      last_v = p[0] = a[0];
6      for (i=1; i<n; i++) {
7          v = last_v + a[i];
8          p[i] = v ;
9         last_v = v;
10     }
11
12     return ;
13 }

I have this function. as an exercise, i am told that it can be optimized using an unrolling factor of 3 and changing only lines 7-9. However, I am really lost on how this would be done.

Could someone show it to me?

Thank you! any help is appreciated :)

Upvotes: 1

Views: 722

Answers (1)

John D
John D

Reputation: 1637

Your main goal with unrolling is to make it easier for the CPU instruction pipeline to process instructions. Multiple instructions can be in process at the same time, and various factors can interrupt the smooth flow. Eg, data dependencies: if a later instruction needs to load data and that data is being changed by earlier instructions, the later instruction has to wait at its load stage until the earlier instructions have saved that data. That is called a pipeline stall.

See comments for why data dependency is the main bottleneck in this example.

The textbook example given in the Question seems to be mainly an exercise to get familiarity with manually unrolling loops and is not intended to investigate any performance issues.

Your first draft for the unrolling code looks like this, but you will get unwanted cases

for (i=1; i < n; i+=3) {
    v = last_v + a[i];
    p[i] = v ;
    last_v = v;
    v = last_v + a[i+1];
    p[i+1] = v ;
    last_v = v;
    v = last_v + a[i+2];
    p[i+2] = v ;
    last_v = v;
}

Unwanted cases - note that the last index you want to process is (n-1)

n Unwanted cases
5 Array indexes 1,2,3 then 4,5,6 => the unrolled code processes 2 unwanted cases, index 5 and 6
6 Array indexes 1,2,3 then 4,5,6 => the unrolled code processes 1 unwanted case, index 6
7 Array indexes 1,2,3 then 4,5,6 => no unwanted cases

See also Handling unrolled loop remainder

So, eliminate the last loop if there are any unwanted cases and you will then have

// `nn` is the adjusted loop limit to avoid an extra loop with unwanted cases 
int nn = n;
if ( (n-1)%3 != 0 ) nn = n - 3;

for (i=1; i < nn; i+=3) {
    v = last_v + a[i];
    p[i] = v ;
    last_v = v;
    v = last_v + a[i+1];
    p[i+1] = v ;
    last_v = v;
    v = last_v + a[i+2];
    p[i+2] = v ;
    last_v = v;
}

At this point we need to handle the remaining/missing cases:

n nn values of iterator i
5 2 1,4 => final i = n - 1
6 3 1,4 => final i = n - 2
7 7 1,4,7 => final i = n

If i = n - 1, you have 1 missing case, ie index n-1
If i = n - 2, you have 2 missing cases, ie index n-2 and n-1
If i = n, you're done

if ( i == n - 1 ) { // 1 missing case
    v = last_v + a[n-1]
    p[n-1] = v;
}
if ( i == n - 2 ) { // 2 missing cases
    v = last_v + a[n-2]
    p[n-2] = v;
    last_v = v;
    v = last_v + a[n-1]
    p[n-1] = v;
}

Upvotes: 2

Related Questions