fiftyplus
fiftyplus

Reputation: 561

how to compare different optimization level files from gprof

everyone, I am running the gprof to check the percentage execution time in two different optimization level (-g -pg vs -O3 -pg).

So I got the result that one function takes 68% exc-time in O3, but only 9% in -g version.

I am not sure how to find out the reason behind it. I am thinking compare the two version files before compiled, but i am not sure the cmd to do so.

Is there any other method to find out the reasons for this execution time difference.

Upvotes: 2

Views: 2862

Answers (2)

Douglas B. Staple
Douglas B. Staple

Reputation: 10946

You have to be careful interpreting gprof/profiling results when you're using optimization flags. Compiling with -O3 can really change the structure of your code, so that it's impossible for gprof to tell how much time is spent where.

In particular, function inlining enabled with the higher optimization levels make it that some of your functions will be completely replaced by inline code, so that they don't appear to take any time at all. The time that would be spent in those child functions is then attributed to the parent functions that call them, so it can look like the time spent in a given parent function actually increased.

I couldn't find a really good reference for this. Here's one old example:
http://gcc.gnu.org/ml/gcc/1998-04/msg00591.html
That being said, I would expect this kind of strange behavior when running gprof with -O3. I always do profiling with just -O1 optimization to minimize these kinds of effects.

Upvotes: 5

Matteo Italia
Matteo Italia

Reputation: 126867

I think that there's a fundamental flaw in your reasoning: that the fact that it takes 68% of execution time in the optimized version vs just the 9% in the unoptimized version means that the unoptimized version performs better.

I'm quite sure, instead, that the -O3 version performs better in absolute terms, but the optimizer did a way better job on the other functions, so, in proportion to the rest of the optimized code, the given subroutine results slower - but it's actually faster - or, at least, as fast - than the unoptimized version.

Still, to check directly the differences in the emitted code you can use the -S switch. Also, to see if my idea is correct, you can roughly compare the CPU time took by the function in -O0 vs -03 multiplying that percentage with the user time took by your program provided by a command like time (also, I'm quite sure that you can obtain a measure of absolute time spent in a subroutine in gprof, IIRC it was even in the default output).

Upvotes: 0

Related Questions