Vivek Goel
Vivek Goel

Reputation: 24150

How to find CPU Usage in google profiler

I am using Google CPU Profiling tool.

http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html

On the documentation it is given

Analyzing Text Output

Text mode has lines of output that look like this:

   14   2.1%  17.2%       58   8.7% std::_Rb_tree::find

Here is how to interpret the columns:

But I am not able to understand which columns tell me exact or percentage CPU usages of function ?

How to get CPU uses of a function suing google profile ?

Upvotes: 4

Views: 3972

Answers (1)

osgx
osgx

Reputation: 94335

Text mode has lines of output that look like this:

It will have a lot of lines, for example, collect profile:

$ CPUPROFILE=a.pprof LD_PRELOAD=./libprofiler.so ./a.out

The program a.out is the same as here: Kcachegrind/callgrind is inaccurate for dispatcher functions?

Then analyze it with pprof top command:

$ pprof ./a.out a.pprof
Using local file ./a.out.
Using local file a.pprof.
Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 185 samples
      76  41.1%  41.1%       76  41.1% do_4
      51  27.6%  68.6%       51  27.6% do_3
      37  20.0%  88.6%       37  20.0% do_2
      21  11.4% 100.0%       21  11.4% do_1
       0   0.0% 100.0%      185 100.0% __libc_start_main
       0   0.0% 100.0%      185 100.0% dispatcher
       0   0.0% 100.0%       34  18.4% first2
       0   0.0% 100.0%       42  22.7% inner2
       0   0.0% 100.0%       68  36.8% last2
       0   0.0% 100.0%      185 100.0% main

So, what is here: the total sample count is 185; and the Frequency is the default (1 sample every 10 ms; or 100 samples per second). Then total runtime is ~ 1.85 second.

First column is the number of samples, which was taken when a.out works in the given function. If we divide it by Frequency, we will get total time estimation of given function, e.g. do_4 runs for ~0.8 sec

Second column is the sample count in given function divided by total count, or the percentage of this function in total program run time. So do_4 is the slowest function (41% of total program time) and do_1 is only 11% of program runtime. I think you are interested in this column.

Third column is the sum of current and preceding lines; so we can know that 2 slowest functions, do_4 and do_3 totally accounted for 68% of total run time (41%+27%)

4rd and 5th columns are like first and second; but these one will account not only samples of the given function itself, but also samples of all functions called from given, both directly and indirectly. You can see, that main and all called from it is 100% of total run time (because main is the program itself; or root of calltree of program) and last2 with its children is 36.8% of runtime (its children in my program are: half of calls to do_4 and half of calls to do_3 = 41.1 + 27.6 /2 = 69.7/2 ~= 34% + some time in the function itself)

PS: there are some other useful pprof commands, like callgrind or gv which shows graphic representation of call tree with profiling information added.

Upvotes: 7

Related Questions