Reputation: 234

gprof gives no output

I am trying to use gprof to profile some numerical code I am developing, but gprof seems to fail to collect data from my program. Here is my command line:

g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test

The gmon.out file is created, but it seems to have no data. When i run

gprof -b fftw_test gmon.out > gprof.out

All I get is

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    


                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name


Index by function name

Any insights?

The code does a lot of stuff, it does not simply call FFTW routines. It has functions that compute certain complex coefficients, functions that multiply input data by these coefficients, and so on.

Edit.: Including example code and results.

#include <cstdlib>
#include <ctime>

int main()
{
   std::srand( std::time( 0 ) );

   double sum = 0.0;

   for ( int i = 0; i < RAND_MAX; ++i )
      sum += std::rand() / ( double ) RAND_MAX;

   std::cout << sum << '\n';

   return 0;
}

Command lines:

$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && ./gprof_test
1.07374e+09
$ gprof -b gprof_test gmon.out > gprof.out
$ cat gprof.out

Result:

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    


                        Call graph


granularity: each sample hit covers 2 byte(s) no time propagated

index % time    self  children    called     name


Index by function name

And that's it.

Upvotes: 9

Answers (3)

Kiril

Reputation: 369

I presume the problem comes from the fact you are using O3 level of optimisation. With gcc-8.4.0 I get nothing with O3, limited data (e.g. number of function calls missing) with O2 and proper profile for O1 and O0.

This seems to have been a known bug with older versions of gcc, yet I did not come across any sources regarding such a problem with later versions. I can only hypothesize whether it is a bug with the compiler or whether more aggressive optimisations prevent some performance data from being collected.

Upvotes: 1

jaymmer - Reinstate Monica

Reputation: 4348

If you're using gcc 6, you're most likely running into this bug (note that the bug is not specific to Debian but depends on how gcc was built). A workaround is simply to compile using the `-no-pie' option which disables position-independent code generation.

This is a good start if you want to know more about PIE.

Upvotes: 8

osgx

Reputation: 94445

gprof seems to fail to collect data from my program. Here is my command line:
g++ -Wall -O3 -g -pg -o fftw_test fftw_test.cpp -lfftw3 -lfftw3_threads -lm && ./fftw_test

Your program uses fftw library and probably consist almost only of fftw library calls. What is the running time? Your program may be too fast to be profiled with gprof. Update And library may be not seen by gprof as it was compiled without gprof profiling enabled.

GNU gprof has two parts. First, it instruments function calls in c/cpp files which were compiled with -pg option (with mcount function calls - https://en.wikipedia.org/wiki/Gprof) - to get caller/callee info. Second, it links additional profiling library into your executable to add periodic sampling to find which code was executed for more time. Sampling is done with profil (setitimer). Setitimer profiling has limited resolution and can't resolve intervals less than 10 ms or 1 ms (100 or 1000 samples per second).

And in your example, the fftw library was probably compiled without instrumentation, so no mcount calls in it. It still can be captured by sampling part, but only for main thread of the program (https://en.wikipedia.org/wiki/Gprof - "typically it only profiles the main thread of application").

perf profiler have no instrumentation with mcount (it gets callee/caller from stack unwinding when recorded with -g option), but it have much better statistics/sampling variants (it can use hardware PMU counters), without 100 or 1000 Hz limit, and it supports (profiles) threads correctly. Try perf record -F1000 ./fftw_test (with 1 kHz sampling frequency) and perf report or perf report > report.txt. There are some GUI/HTML frontends to perf too: https://github.com/KDAB/hotspot https://github.com/jrfonseca/gprof2dot

For better setitimer style profiler check google-perftools https://github.com/gperftools/gperftools for "CPU PROFILER".

======

With your test I have some gprof results on Debian 8.6 Linux kernel version 3.16.0-4-amd64 x86_64 machine, g++ (Debian 4.9.2-10), gprof is "GNU gprof (GNU Binutils for Debian) 2.27"

$ cat gprof_test.cpp
#include <cstdlib>
#include <ctime>
#include <iostream>
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
5.00069e+06
real    0m0.992s
$ gprof -b gprof_test gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total
 time   seconds   seconds    calls  Ts/call  Ts/call  name
  0.00      0.00     0.00        1     0.00     0.00  _GLOBAL__sub_I_main

So, gprof did not catch any time samples in this 1 second example and have no information about calls into the libraries (they were compiled without -pg). After adding some wrapper functions and prohibiting inline optimization I have some data from gprof but library time was not accounted (it sees 0.72 seconds of 2 second runtime):

$ cat *cpp
#include <cstdlib>
#include <ctime>
#include <iostream>

int rand_wrapper1()
{
  return std::rand();
}
int rand_scale1()
{
  return rand_wrapper1() / ( double ) RAND_MAX;
}
int main()
{
   std::srand( std::time( 0 ) );
   double sum = 0.0;
   for ( int i = 0; i < 100000000; ++i )
    sum+= rand_scale1();
//      sum += std::rand() / ( double ) RAND_MAX;
   std::cout << sum << '\n';
   return 0;
}
$ g++ -Wall -O3 -fno-inline -g -pg -o gprof_test gprof_test.cpp && time ./gprof_test
real    0m2.345s
$ gprof -b gprof_test gmon.out
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ns/call  ns/call  name
 80.02      0.57     0.57                             rand_scale1()
 19.29      0.71     0.14 100000000     1.37     1.37  rand_wrapper1()
  2.14      0.72     0.02                             frame_dummy
  0.00      0.72     0.00        1     0.00     0.00  _GLOBAL__sub_I__Z13rand_wrapper1v
  0.00      0.72     0.00        1     0.00     0.00  __static_initialization_and_destruction_0(int, int) [clone .constprop.0]


                        Call graph


granularity: each sample hit covers 2 byte(s) for 1.39% of 0.72 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]     97.9    0.57    0.14                 rand_scale1() [1]
                0.14    0.00 100000000/100000000     rand_wrapper1() [2]
-----------------------------------------------
                0.14    0.00 100000000/100000000     rand_scale1() [1]
[2]     19.0    0.14    0.00 100000000         rand_wrapper1() [2]

And perf sees all parts:

$ perf record ./gprof_test
0
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.388 MB perf.data (~16954 samples) ]
$ perf report |more   
# Samples: 9K of event 'cycles'
# Event count (approx.): 7373484231
#
# Overhead     Command      Shared Object                     Symbol
# ........  ..........  .................  .........................
#
    25.91%  gprof_test  gprof_test         [.] rand_scale1()
    21.65%  gprof_test  libc-2.19.so       [.] __mcount_internal
    13.88%  gprof_test  libc-2.19.so       [.] _mcount
    12.54%  gprof_test  gprof_test         [.] main
     9.35%  gprof_test  libc-2.19.so       [.] __random_r
     8.40%  gprof_test  libc-2.19.so       [.] __random
     3.97%  gprof_test  gprof_test         [.] rand_wrapper1()
     2.79%  gprof_test  libc-2.19.so       [.] rand
     1.41%  gprof_test  gprof_test         [.] mcount@plt
     0.03%  gprof_test  [kernel.kallsyms]  [k] memset

Upvotes: 0

gprof gives no output

Answers (3)

Related Questions