SmacL
SmacL

Reputation: 22932

Does the VS2008 C++ optimizer sometimes produce slower code?

Following on from a previous question, I've been playing around with optimizer settings in my release build to see what benefits are to be gleaned from using compiler optimization. Up until now, I've been using /Ob1 (only inline where inline is explicitly given), and /Oi (Enable intrinsic functions). I tried changing this to include /Ot (favour fast code), /Oy (omit frame pointers) and /Ob2 (inline any suitable), and to my surprise a regression suite that was taking 2h58 minutes now took 3h16m. My first assumption was that my own inlining was more aggressive than the compiler, but moving back from /Ob2 to /Ob1 only improved things to 3h12m. I'm still running more tests, but it would appear that in some cases /Ot (favour fast code) is actually slowing things down. The software is multi-threaded and computation intensive (surface modelling, manipulation and visualisation), and has already been heavily manually optimized based on profiler results. The program is also dealing with large amounts of data, and uses #pragma pack(4) pretty regularly.

So the questions is this. For a manually optimized program is compiler optimization in VS2008 liable to do more damage than good? Put another way, are there known documented scenarios where compiler optimization reduces performance? (n.b. profiling the compiler optimized code is painful, hence profiling to date has been done on unoptimized code).

Edit As per Cody Gray's and others suggestions, I have added /O2 to the optimization settings and re-executed my test suite. This resulted in a run time of 3h01, which was comparable to the minimally optimized run. Given the (slightly dated) MSDN guide lines on optimization and post from GOZ, I'm going to check /O1 to see if smaller is actually faster in my case. Note the current EXE file is about ~11mb. I'll also try and get a VS2010 build together and see how that fares.

Edit2 With /O1, the run time was 3h00, and the 11mb exe was 62k smaller. Note that the reason behind this post, and the previous linked one, were to check whether the benefits of turning on compiler optimizations outweighed the drawbacks in terms of profiling and debugging. In this specific instance, they appear not to be, although I admit to being surprised that none of the combinations tried added any benefit and some visibly reduced performance. FWIW, as per this previous thread, I tend to do most of my optimization at design time and use the profiler primarily to check design assumptions, I reckon I'll be sticking with this approach. I'll have one final go on VS2010 with whole program optimization enabled and leave it at that.

Thanks for all the feedback!

Upvotes: 3

Views: 1052

Answers (3)

Puppy
Puppy

Reputation: 147056

It's well known that favour fast code is not always faster than favour small code. The compiler heuristic is not omniscient and it can make mistakes. In some cases, smaller code is faster than faster code, as it were.

Use /O2 for the fastest code- the compiler knows better than you how the various settings may interact.

Wait. You profiled unoptimized code? That's insanity. Compiler optimizations are not like manual optimizations - they're always done and there's no reason to profile for them- you could identify bottlenecks that don't exist, and etc. If you want accurate profiling data, you get the compiler to do it's absolute best first, and then you profile.

You could also look at using Profile Guided Optimization, which will guide the compiler's optimizer in some impressive fashions.

Upvotes: 2

Frédéric Hamidi
Frédéric Hamidi

Reputation: 263157

The documentation for /Ot states:

If you use /Os or /Ot, then you must also specify /Og to optimize the code.

So you might want to always pass /Og with /Ot in your tests.

That said, /Ot favors fast code at the expense of program size, and can produce very large binaries, especially with heavy inlining. Large binaries have difficulties to take advantage of the processor cache.

Upvotes: 4

Goz
Goz

Reputation: 62333

Its quite possible that it is trying to inline large functions or, at least, a large amount of functions in a loop. At this point you run the risk of causing an instruction cache reload. That can cause a big slow down. Inline is not always the best thing to do (though more often than not it is helpful). If you have any large loops with lots of function calls in it then it may be better to break the loop into several loops. This way the loop can stay inside the instruction cache and you get significantly better performance.

Upvotes: 2

Related Questions