Reputation: 24150
I am using Google CPU Profiling tool.
http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html
On the documentation it is given
Analyzing Text Output
Text mode has lines of output that look like this:
14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
Here is how to interpret the columns:
But I am not able to understand which columns tell me exact or percentage CPU usages of function ?
How to get CPU uses of a function suing google profile ?
Upvotes: 4
Views: 3972
Reputation: 94335
Text mode has lines of output that look like this:
It will have a lot of lines, for example, collect profile:
$ CPUPROFILE=a.pprof LD_PRELOAD=./libprofiler.so ./a.out
The program a.out
is the same as here: Kcachegrind/callgrind is inaccurate for dispatcher functions?
Then analyze it with pprof top
command:
$ pprof ./a.out a.pprof
Using local file ./a.out.
Using local file a.pprof.
Welcome to pprof! For help, type 'help'.
(pprof) top
Total: 185 samples
76 41.1% 41.1% 76 41.1% do_4
51 27.6% 68.6% 51 27.6% do_3
37 20.0% 88.6% 37 20.0% do_2
21 11.4% 100.0% 21 11.4% do_1
0 0.0% 100.0% 185 100.0% __libc_start_main
0 0.0% 100.0% 185 100.0% dispatcher
0 0.0% 100.0% 34 18.4% first2
0 0.0% 100.0% 42 22.7% inner2
0 0.0% 100.0% 68 36.8% last2
0 0.0% 100.0% 185 100.0% main
So, what is here: the total sample count is 185; and the Frequency is the default (1 sample every 10 ms; or 100 samples per second). Then total runtime is ~ 1.85 second.
First column is the number of samples, which was taken when a.out works in the given function. If we divide it by Frequency, we will get total time estimation of given function, e.g. do_4
runs for ~0.8 sec
Second column is the sample count in given function divided by total count, or the percentage of this function in total program run time. So do_4
is the slowest function (41% of total program time) and do_1
is only 11% of program runtime. I think you are interested in this column.
Third column is the sum of current and preceding lines; so we can know that 2 slowest functions, do_4
and do_3
totally accounted for 68% of total run time (41%+27%)
4rd and 5th columns are like first and second; but these one will account not only samples of the given function itself, but also samples of all functions called from given, both directly and indirectly. You can see, that main
and all called from it is 100% of total run time (because main
is the program itself; or root of calltree of program) and last2
with its children is 36.8% of runtime (its children in my program are: half of calls to do_4
and half of calls to do_3
= 41.1 + 27.6 /2 = 69.7/2 ~= 34% + some time in the function itself)
PS: there are some other useful pprof commands, like callgrind
or gv
which shows graphic representation of call tree with profiling information added.
Upvotes: 7