nightcrawler
nightcrawler

Reputation: 15

Vectorization of loop with accumulation in variable

I have the following loop which I'm compiling with icc

for (int i = 0; i < arrays_size; ++i) {
      total = total + C[i];
}

The vectorization report says this loop has been vectorised but i don't understand how this is possible since there's an obvious read after write dependence.

The report output is the following:

LOOP BEGIN at loops.cpp(46,5)
      remark #15388: vectorization support: reference C has aligned access   [ loops.cpp(47,7) ]
      remark #15305: vectorization support: vector length 4
      remark #15399: vectorization support: unroll factor set to 8
      remark #15309: vectorization support: normalized vectorization overhead 0.475
      remark #15300: LOOP WAS VECTORIZED
      remark #15448: unmasked aligned unit stride loads: 1
      remark #15475: --- begin vector loop cost summary ---
      remark #15476: scalar loop cost: 5
      remark #15477: vector loop cost: 1.250
      remark #15478: estimated potential speedup: 3.990
      remark #15488: --- end vector loop cost summary ---
      remark #25015: Estimate of max trip count of loop=31250
   LOOP END

Can someone explain what this means and how is it possible to vectorise this loop?

Upvotes: 0

Views: 144

Answers (1)

chtz
chtz

Reputation: 18807

Depending on the type of total and C[i], you can exploit the associativity and commutativity of addition and first sum 4 or 8 (or more) sub-totals.

int subtotal[4] = {0,0,0,0};
for (int i = 0; i < arrays_size; i+=4) {
    for(int k=0; k<4; ++k)
        subtotal[k] += C[i+k];
}
// handle remaining elements of C, if necessary ...
// sum-up sub-totals:
total = (subtotal[0]+subtotal[2]) + (subtotal[1]+subtotal[3]);

This works for any integer type, but ICC by default assumes that floating point addition is also associative (gcc and clang require some subset of -ffast-math for that).

Upvotes: 1

Related Questions