Reputation:
I've been trying to optimize some simple code and I try two kind of optimizations, loop enrolling and memory aliasing.
My original code:
int paint(char *dst, unsigned n, char *src, char bias)
{
unsigned i;
for (i=0;i<n;i++) {
*dst++ = bias + *src++;
}
return 0;
}
My optimizated code after loop enrolling:
int paint(char *dst, unsigned n, char *src, char bias)
{
unsigned i;
for (i=0;i<n;i+=2) {
*dst++ = bias + *src++;
*dst++ = bias + *src++;
}
return 0;
}
How after this I can optimize the code with memory aliasing? And there are another good optimizations for this code? (Like cast the pointers to long pointers to copy quickly)
Upvotes: 0
Views: 225
Reputation: 11311
Are you only concerned about performance? What about correctness?
Judging by the name of your function paint
and the variable bias
(and using my crystal ball), I guess you need to add with saturation (in case of overflow). This can be dune by using intrinsics for paddusb
(https://www.felixcloutier.com/x86/paddusb:paddusw): https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=774,433,4179,4179&cats=Arithmetic&text=paddusb
Upvotes: 1
Reputation: 54325
Optimization in C is easier than this.
cc -Wall -W -pedantic -O3 -march=native -flto source.c
That will unroll any loops that need to be unrolled. Doing your own unrolling, Duff's Device and other tricks are outdated and pretty useless.
As for aliasing, your function uses two char*
parameters. If they are guaranteed to never point into the same arrays then you can use the restrict
keyword. That will allow the optimizer to assume more things about the code and use vectorized instructions.
Check out the assembly produced here: https://godbolt.org/z/xMfebr or https://godbolt.org/z/j1xMYz
Can you manage to do all of that by hand? Probably not.
Upvotes: 1