SpaceMonkey
SpaceMonkey

Reputation: 4365

Intel compiler cannot vectorize this simple loop?

So I have the following code which seems very simple to me:

#define MODS_COUNT 5

int start1 = <calc at runtime>;
int start2 = <calc at runtime>;

for (int j=0; j<MODS_COUNT; j++) // loop 5 times doing simple addition.
    logModifiers[start1 +  j] += logModsThis[start2 + j];

This loop is part of an outer loop (not sure if this makes a difference)

The compiler says: message : loop was not vectorized: vectorization possible but seems inefficient.

Why can't this loop be vectorised? it seems very simple to me. How can I force vectorisation and check performance myself?

I have Intel C++ Compiler 2013 update 3.

Full code is here if anyone is interested: http://pastebin.com/Z6H5ZejW

Edit: I understand that the compiler decided that it's inefficient. I'm asking:

Why is it inefficient?

How can I force it so that I can benchmark myself?

Edit2: If I change it to 4 instead of 5 then it gets vectorised. What makes 5 inefficient? I thought it can be done in 2 instructions, the first does 4 and the second is "normal" does 1, instead of 5 instructions.

Upvotes: 6

Views: 1487

Answers (2)

Bogi
Bogi

Reputation: 2598

For vectorization to make sense the most inner loop has to have a large enough trip count. In your case it is small, and the compiler calculates according to its cost model that the speedups due to vectorization would be small, or negative.

I've seen wonders when did loop interchange - exchanged the inner and outer loop so that the most inner loops has a large trip count.

Upvotes: 0

Koushik Shetty
Koushik Shetty

Reputation: 2176

According to vectorization in intel compilers :

There are SIMD(Single instruction multiple data) registers which are 128 byte long. so if sizeof(int) is 4 then 4 integers can sit in these registers and a single instruction can perform on these 4 ints.(this also depends if same type of operation is done on these ints, here its true. more over each element of the array on LHS is dependant on a different element of a different array.)

if there are 8 ints then two instructions are required.(instead of 8 without vectorization).

but if 5(or 6 or 7) ints are there then that too will require two instructions. which might be not better than without vectorization code.

further reading LINK.

Upvotes: 2

Related Questions