Reputation: 21
Before anyone would tell me to look up old answers or RTFM, please note that I've already done so, so please read the details before directing me to look elsewhere.
I've established that the difference in Optimization levels isn't as simple as some different types of optimization flags having been enabled for a higher optimization level.
For example, I first found the difference in optimization flags of O0 and O1 by following these steps:
gcc -c -Q -O1 --help=optimizers > /tmp/O1-opts
gcc -c -Q -O0 --help=optimizers > /tmp/O0-opts
diff /tmp/O0-opts /tmp/O1-opts | grep enabled
This gave me a list of various optimization flags enabled by O1 over O0.
Then, I compiled the code with -O0 but added all the individual optimization flags enabled by O1 over O0, because the result should be same as O1, right? Well, guess what, it's not!
So, this proves that the difference between optimization levels is not simply the different types of optimization flags used. I mean there must be more differences in optimizations besides the optimization flags that gcc/g++ displays.
Please let me know if someone already knows the answer to this question, or I'll have to look up the source-code of gcc, which wouldn't be trivial for me. Thank you!
As to the reason for why I'm looking for this info, I've some AVX-512 code that experiences less than 3% L1D cache misses with O0 or no optimization flag, but more than 37% (although it speeds up the code) with O1 and beyond. If I can figure it which (hidden) flag is causing it, I might be able to speed up the code even further. There are too many flags in the common.opt file in the gcc source code, so I've hit a wall.
Upvotes: 2
Views: 2263
Reputation: 365247
-O0
is special, and implies spill/reload between every statement for consistent debugging: Why does clang produce inefficient asm with -O0 (for this simple floating point sum)? - You'll still see vars still being stored/reloaded kind of like volatile
; there's no -f
option to change that.
-O0
also means to disable optimization in general.
-f
optimization options don't work at -O0
; optimization has to be enabled (-Og
or -O1
or higher) for them to do anything. (Except for maybe a couple special cases.) See also another Q&A reporting no difference in asm, and an answer quoting the GCC manual Not all optimizations are controlled directly by a flag.
You could maybe use -O1
and use -fno-foo -fno-bar ...
to disable the options that -O1
mentions enabling, and get different code-gen from -O0
.
Options are also visible in GCC's asm comments with -S -fverbose-asm -o-
output.
Also, running slower (because of store/reload or any other reason) gives HW prefetch more time to keep up and have data ready in L2 or even L1d before a load uop executes and has a demand miss.
Upvotes: 1