Reputation: 328
An easy example:
auto f(double*a,unsigned long const N)
{
for(auto i(0);i!=N;++i) a[i]+=2*i;
}
and then I use g++ -std=c++1z -O2 -march=native -ftree-vectorize -fopt-info -S to compile the source code. The output show: note loop vectorized. That is good.
After that I want to add more aggressive optimization to such function. So I write:
__attribute__((optimize("unroll-loops"))) auto f(double*a,unsigned long const N)
{
for(auto i(0);i!=N;++i) a[i]+=2*i;
}
and then I use g++ -std=c++1z -O2 -march=native -ftree-vectorize -fopt-info -S to compile the source code. The output just show: note loop unroll 7 times. And then I check the asm file and find out that gcc just does unroll-loops optimization but ignores tree-vectorize in the command line.
I also try to use:
#pragma GCC optimize("unroll-loops")
auto f(double*a,unsigned long const N)
{
for(auto i(0);i!=N;++i) a[i]+=2*i;
}
still not working. So I want to ask how to keep command line options but add more optimization flag to certain function.
I use g++-5.2, x86-64 linux and cpu support avx2.
Upvotes: 3
Views: 826
Reputation: 3917
From the GCC documentation...
optimize
The optimize attribute is used to specify that a function is to be compiled with different optimization options than specified on the command line. Arguments can either be numbers or strings. Numbers are assumed to be an optimization level. Strings that begin with O are assumed to be an optimization option, while other options are assumed to be used with a -f prefix. You can also use the ‘#pragma GCC optimize’ pragma to set the optimization options that affect more than one function. See Function Specific Option Pragmas, for details about the ‘#pragma GCC optimize’ pragma.
So, the optimize attribute and the #pragma
are not additive. You would have to explicitly pass all the necessary optimization arguments to the attribute.
For example...
__attribute__((optimize("O2", "tree-vectorize", "unroll-loops"))) auto f(double*a,unsigned long const N)
{
for(auto i(0);i!=N;++i) a[i]+=2*i;
}
However, you may get better results using PGO instead of explicitly forcing the compiler to use specific optimizations.
Upvotes: 2