Reputation: 51
I'm trying to use loop unrolling to optimize my code.
This was the original code
int a[N]; //arbitrary array
int vara; //arbitrary variable
int varb; //arbitrary variable
for (int i=0;i<N;i++)
a[i]=(a[i+1]* vara) + varb;
so I tried doing this
for (int i=0;i<N-1;i+=2)
{
int a=a[i+1]*vara;
int b=a[i+2]*vara;
int c=a+varb;
int d=b+varb;
a[i]=c;
a[i+1]=d;
}
I thought this would work because I'm enabling the compiler to do addition and multiplication for multiple iterations at a time, which I thought would increase instruction level parallelism. Yet doing this does not speed up my code at all, what am I doing wrong?
Any other suggestions to optimize this code would also be much appreciated.
Upvotes: 0
Views: 1531
Reputation: 4992
Your compiler very likely does unrolling already at high optimization levels, maybe you need -funroll-loops
or something like it. But even the docs warn that this isn't a magic option to gain speed, as it costs instruction cache and program space.
Loop unrolling is basically what you've done:just have fewer loop iterations and do the work of multiple smaller iterations. Whether or not it's faster is highly dependent on the loop body and the actual machine the code is run on.
Unrolling also really only makes sense if jumps are expensive and there's an instruction level paralleism gain, which given the anti-dependency and the tuned branch predictors in modern processors is unlikely.
That said, you need to at the very least run some microbenchmarking with statistical analysis.
If I had to hazard a way for you to improve the speed on this: remove the dependency on the next element in the array. This then turns into a basic vector multiply-accumulate, which is trivial to vectorize.
Upvotes: 1