Reputation: 9319
One of the questions that I asked some time ago had undefined behavior, so compiler optimization was actually causing the program to break.
But if there is no undefined behavior in you code, then is there ever a reason not to use compiler optimization? I understand that sometimes, for debugging purposes, one might not want optimized code (please correct me if I am wrong). Other than that, on production code, why not always use compiler optimization?
Also, is there ever a reason to use, say, -O
instead of -O2
or -O3
?
Upvotes: 45
Views: 21501
Reputation: 1
Personally, I divide my codebase into three categories. A) Pre-existing code, trusted source, reputable B) I developed it, in development C) I developed it, done and tested.
There's also D) Pre-existing code, untrusted source or not reputable or buggy,
which I generally avoid.
A) I always compile with optimizations. If one of my Apps links to postgres, or more precisely, libpq.so.x, the client library - I clone a stable release, not whatever's on the master branch. The last stable release of Postgres at this point is 16.4, so that's what I grab. My default for this type of code in flags is :
-flto -O2 -march=znver4 -fuse-ld=lld -stdlib=libc++
and pulling
-lunwind
into the linker flags. If something goes wrong, I definetely am not delving into whatever library this is to figure out what the error is, but I report it on git.
B) I always compile those with full debug symbols, without optimizations. The reason is obvious: stepping through the code should be 1:1 between compiled app and source - any optimizations bar you from this.
C) Once I am certain my app is bug-free and has been tested as such, I try enabling all optimizations like in A) and subject it to the same unit tests the debug version went through. Once I am certain it holds up, I remove the march flag and compare performance to the march version, and based on this, I make the call whether to compile distinct versions based on the architecture of the deployment environment, or to set march to something like x86-64-v3.
To answer your question more precisely: You compile with optimizations if you're 100% certain this code works, either because you've personally tested it, or if you know it's a release version of a major app/library which thousands of people use and compile everyday and which you can trust not to be untested.
Upvotes: 0
Reputation: 81123
An optimization that is predicated on the idea that a program won't do X will be useful when processing tasks that don't involve doing X, but will be at best counter-productive when performing a task which could be best accomplished by doing X.
Because the C language is used for many purposes, the Standard deliberately allows compilers which are designed for specialized purposes to make assumptions about program behavior that would render them unsuitable for many other purposes. The authors of the Standard allowed implementations to extend the semantics of the language by specifying how they will behave in situations where the Standard imposes no requirements, and expected that quality implementations would seek to do so in cases where their customers would find it useful, without regard for whether the Standard required them to do so.
Programs that need to perform tasks not anticipated or accommodated by the Standard will often need to exploit constructs whose behavior is defined by many implementations, but not mandated by the Standard. Such programs are not "broken", but are merely written in a dialect that the Standard doesn't require that all implementations support.\
As an example, consider the following function test
and whether it satisfies the following behavioral requirements:
If passed a value whose bottom 16 bits would match those of some power of 17, return the bottom 32 bits of that power of 17.
Do not write to arr[65536] under any circumstances.
The code would appear like it should obviously meet the second requirement, but can it be relied upon to do so?
#include <stdint.h>
int arr[65537];
uint32_t doSomething(uint32_t x)
{
uint32_t i=1;
while ((uint16_t)i != x)
i*=17;
if (x < 65536)
arr[x] = 1;
return i;
}
void test(uint32_t x)
{
doSomething(x);
}
If the code is fed to clang with a non-zero optimization level, the generated machine code for test
will fail the second requirement if x
is 65536, since the generated code will be equivalent to simply arr[x] = 1;
. Clang will perform this "optimization" even at -O1
, and none of the normal options to limit broken optimizations will prevent it other than those which force C89 or C99 mode.
Upvotes: 0
Reputation: 3448
There is an example, why sometimes is dangerous using optimization flag and our tests should cover most of the code to notice such an error.
Using clang (because in gcc even without optimization flag, makes some iptimizations and the output is corrupted):
File: a.cpp
#include <stdio.h>
int puts(const char *str) {
fputs("Hello, world!\n", stdout);
return 1;
}
int main() {
printf("Goodbye!\n");
return 0;
}
Without -Ox flag:
> clang --output withoutOptimization a.cpp; ./withoutOptimization
> Goodbye!
With -Ox flag:
> clang --output withO1 -O1 a.cpp; ./withO1
> Hello, world!
Upvotes: 2
Reputation: 40613
Compiler optimisations have two disadvantages:
Some of the optimisations performed by -O3 can result in larger executables. This might not be desirable in some production code.
Another reason to not use optimisations is that the compiler that you are using may contain bugs that only exist when it is performing optimisation. Compiling without optimisation can avoid those bugs. If your compiler does contain bugs, a better option might be to report/fix those bugs, to change to a better compiler, or to write code that avoids those bugs completely.
If you want to be able to perform debugging on the released production code, then it might also be a good idea to not optimise the code.
Upvotes: 30
Reputation: 146053
In case 2, imagine some OS code that deliberately changes pointer types. The optimizer can assume that objects of the wrong type could not be referenced and generate code that aliases changing memory values in registers and gets the "wrong"1 answer.
Case 3 is an interesting concern. Sometimes optimizers make code smaller but sometimes they make it bigger. Most programs are not the least bit CPU-bound and even for the ones that are, only 10% or less of the code is actually computationally-intensive. If there is any downside at all to the optimizer then it is only a win for less than 10% of a program.
If the generated code is larger, then it will be less cache-friendly. This might be worth it for a matrix algebra library with O(n3) algorithms in tiny little loops. But for something with more typical time complexity, overflowing the cache might actually make the program slower. Optimizers can be tuned for all this stuff, typically, but if the program is a web application, say, it would certainly be more developer-friendly if the compiler would just do the all-purpose things and allow the developer to just not open the fancy-tricks Pandora's box.
1. Such programs are usually not standard-conforming so the optimizer is technically "correct", but still not doing what the developer intended.
Upvotes: 11
Reputation: 10969
Two big reasons that I have seen arise from floating point math, and overly aggressive inlining. The former is caused by the fact that floating point math is extremely poorly defined by the C++ standard. Many processors perform calculations using 80-bits of precision, for instance, only dropping down to 64-bits when the value is put back into main memory. If a version of a routine flushes that value to memory frequently, while another only grabs the value once at the end, the results of the calculations can be slightly different. Just tweaking the optimizations for that routine may well be a better move than refactoring the code to be more robust to the differences.
Inlining can be problematic because, by its very nature, it generally results in larger object files. Perhaps this increase is code size is unacceptable for practical reasons: it needs to fit on a device with limited memory, for instance. Or perhaps the increase in code size results in the code being slower. If it a routine becomes big enough that it no longer fits in cache, the resultant cache misses can quickly outweigh the benefits inlining provided in the first place.
I frequently hear of people who, when working in a multi-threaded environment, turn off debugging and immediately encounter hordes of new bugs due to newly uncovered race conditions and whatnot. The optimizer just revealed the underlying buggy code here, though, so turning it off in response is probably ill advised.
Upvotes: 3
Reputation: 15327
If there is no undefined behavior, but there is definite broken behavior (either deterministic normal bugs, or indeterminate like race-conditions), it pays to turn off optimization so you can step through your code with a debugger.
Typically, when I reach this kind of state, I like to do a combination of:
If the bug is more devious, I pull out valgrind and drd, and add unit-tests as needed, both to isolate the problem and ensure that to when the problem is found, the solution works as expected.
In some extremely rare cases, the debug code works, but the release code fails. When this happens, almost always, the problem is in my code; aggressive optimization in release builds can reveal bugs caused by mis-understood lifetimes of temporaries, etc... ...but even in this kind of situation, having a debug build helps to isolate the issues.
In short, there are some very good reasons why professional developers build and test both debug (non-optimized) and release (optimized) binaries. IMHO, having both debug and release builds pass unit-tests at all times will save you a lot of debugging time.
Upvotes: 37
Reputation: 529
The reason is that you develop one application (debug build) and your customers run completely different application (release build). If testing resources are low and/or compiler used is not very popular, I would disable optimization for release builds.
MS publishes numerous hotfixes for optimization bugs in their MSVC x86 compiler. Fortunately, I've never encountered one in real life. But this was not the case with other compilers. SH4 compiler in MS Embedded Visual C++ was very buggy.
Upvotes: 4
Reputation: 360592
One example is short-circuit boolean evaluation. Something like:
if (someFunc() && otherFunc()) {
...
}
A 'smart' compiler might realize that someFunc will always return false for some reason, making the entire statement evaluate to false, and decide to not call otherFunc to save CPU time. But if otherFunc contains some code that directly affects program execution (maybe it resets a global flag or something), it now won't perform that step and you program enters an unknown state.
Upvotes: -13
Reputation: 2787
Just happened to me. The code generated by swig for interfacing Java is correct but won't work with -O2 on gcc.
Upvotes: 2