Reputation: 15
I have the following loop which I'm compiling with icc
for (int i = 0; i < arrays_size; ++i) {
total = total + C[i];
}
The vectorization report says this loop has been vectorised but i don't understand how this is possible since there's an obvious read after write dependence.
The report output is the following:
LOOP BEGIN at loops.cpp(46,5)
remark #15388: vectorization support: reference C has aligned access [ loops.cpp(47,7) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 8
remark #15309: vectorization support: normalized vectorization overhead 0.475
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15475: --- begin vector loop cost summary ---
remark #15476: scalar loop cost: 5
remark #15477: vector loop cost: 1.250
remark #15478: estimated potential speedup: 3.990
remark #15488: --- end vector loop cost summary ---
remark #25015: Estimate of max trip count of loop=31250
LOOP END
Can someone explain what this means and how is it possible to vectorise this loop?
Upvotes: 0
Views: 144
Reputation: 18807
Depending on the type of total
and C[i]
, you can exploit the associativity and commutativity of addition and first sum 4 or 8 (or more) sub-totals.
int subtotal[4] = {0,0,0,0};
for (int i = 0; i < arrays_size; i+=4) {
for(int k=0; k<4; ++k)
subtotal[k] += C[i+k];
}
// handle remaining elements of C, if necessary ...
// sum-up sub-totals:
total = (subtotal[0]+subtotal[2]) + (subtotal[1]+subtotal[3]);
This works for any integer type, but ICC by default assumes that floating point addition is also associative (gcc and clang require some subset of -ffast-math
for that).
Upvotes: 1