Grzenio
Grzenio

Reputation: 36649

C++ profiling and optimization

I have some issues with performance of my application. I found this answer on Stackoverflow: https://stackoverflow.com/a/378024/5363

which I like. One bit I don't really understand is what is the relation between code optimization and profiling. Because obviously one wants to profile optimized code, but at the same time a lot of information is lost during optimizations. So is it practical to run optimized code in a debugger and break into it as suggested in the quoted answer?

I am using CMake with gcc under Linux, if this makes any difference.

Upvotes: 3

Views: 3710

Answers (3)

Mike Dunlavey
Mike Dunlavey

Reputation: 40669

As Schumi said, you can use something like pstack to get stack samples. However, what you really need to know is why the program is spending the instant of time when the sample was taken. Maybe you can figure that out from only a stack of function names. It's better if you can also see the lines of code where the calls occurred. It's better still if you can see the argument values and data context. The reason is, contrary to popular conceptions that you are looking for "hot spots", "slow methods", "bottlenecks" - i.e. a measurement-based perspective, the most valuable thing to look for is things being done that could be eliminated.

In other words, when you halt the program in the debugger, consider whatever it is doing as if it were a bug. Try to find a way not to do that thing. However, resist doing this until you take another sample and see it doing the same thing - however you describe that thing. Now you know it's taking significant time. How much time? It doesn't matter - you'll find out after you fix it. You do know that it's a lot. The fewer samples you had to take before seeing it twice, the bigger it is.

Then there's a "magnification effect". After you fix that "speed bug" the program will take a lot less time - but - that wasn't the only one. There are others, and now they take a larger fraction of the time. So do it all again. By the time you finish this, if the program is any bigger than a toy, you could be amazed at how much faster it is. Here's a 43x speedup. Here's a 730x speedup. Here's the dreary math behind it.

You see, the problem with tools is you're paying a price for that ease of sampling. Since you're thinking of it as measurement, you're not concentrating on the reasons why the code is doing what it's doing - dubious reasons. That causes you to miss opportunities to make the code faster, causing you to miss the magnification effect, causing you to stop far short of your ultimate possible speedup.

EDIT: Apologies for the flame. Now to answer your question - I do not turn on compiler optimization until the very end, because it can mask bigger problems. Then I try to do a build that has optimization turned on, but still has symbolic information so the debugger can get a reasonable stack trace and examine variables. When I've hit diminishing speedup returns, I can see how much difference the optimizer made just by measuring overall time - don't need a profiler for that.

Upvotes: 1

Stephane Rolland
Stephane Rolland

Reputation: 39906

The general Law is called the Law of Pareto, the law of 80/20:

  • 20% of the causes produce 80% of the consequences.

By profiling, you are going to indentify the 20% of the most important causes that makes your application slow/consuming memory, or other consequences. And if you fix the 20% causes, you'll tackle 80% of the slowliness/memory consumption etc...

Of course the figures are just figures. Just to give you the spirit of it:

  • You have to focuss only on the real main causes so as to improve the optimization until you're satisfied.

Technically, with gcc under linux, an answer to the question you refering to " How can I profile C++ code running in Linux? " suggests to use, in a nutshell :

Upvotes: 5

Schumi Factor
Schumi Factor

Reputation: 11

If you need to collect stack samples, why do it through a debugger. Run pstack at regular time intervals. You can redirect the output to a different file for each run and analyze those files later. By looking at the call stack of these files, you may figure out the hot function. You do not need a debug binary and can do above on a fully optimized binary.

I would prefer using a profiler tool to doing the above or doing what is listed in the thread that you refer to. They quickly pinpoint the top hot functions and you can understand the call stack by looking at the caller callee graph. I would spend time understanding the caller callee stack rather than analyze random stacks using the above method.

Upvotes: 1

Related Questions