Reputation: 1209
The current loop is:
#define N 3000
...
int i, j;
int a[N][N], b[N][N], c[N];
// Fill in b and c with random values
for (i = 0; i < n; ++i) {
for (j = 0; j < n; ++j) {
a[i][j] = b[i][j] / c[i];
}
}
My optimized version unrolls both outer and inner loop:
for (int i = 0; i < N; i += 2) {
for (int j = 0; j < N; j += 2) {
a[i][j] = b[i][j] / c[i];
a[i][j + 1] = b[i][j + 1] / c[i];
a[i + 1][j] = b[i + 1][j] / c[i + 1];
a[i + 1][j + 1] = b[i + 1][j + 1] / c[i + 1];
}
}
However, my instructor said that the second loop is not optimized very well. The indication to c(i) should be taken out of the loop over j. The loop is optimized by changing the order of indices. This way you make one sweep over the memory in the inner loop instead of zigzag-type of searches.
I am still not sure what he means since changing the order of indices would still make the loop traverse in a zig-zag type. What should be the correct solution for this case?
Upvotes: 0
Views: 124
Reputation: 50067
I'm not sure what your instructor is looking for, but you can make use of a reasonably well-known C technique known as Duff's Device here to help speed up your unrolled loop:
init_arrays();
precomputed_n = (N + 7) / 8;
for(i = 0 ; i < N ; ++i)
{
to = a[i];
from = b[i];
ci = c[i];
n = precomputed_n;
switch(N % 8)
case 0: do { *to++ = *from++ / ci;
case 7: *to++ = *from++ / ci;
case 6: *to++ = *from++ / ci;
case 5: *to++ = *from++ / ci;
case 4: *to++ = *from++ / ci;
case 3: *to++ = *from++ / ci;
case 2: *to++ = *from++ / ci;
case 1: *to++ = *from++ / ci;
} while (--n > 0);
}
Duff's Device is a handy way to unroll loops which combines a while
loop and a switch
.
Upvotes: 2
Reputation: 4084
Put int ci = c[i];
in the outer loop, and inner loop divides by ci.
Note that any reasonable compiler will do this for you.
Upvotes: 2