solalito
solalito

Reputation: 1209

Inconsistency when profiling my code with gprof

I am using a relatively simple code parallelize with OpenMP to familiarize myself with gprof.

My code mainly consists of gathering data from input files, perform some array manipulations and write the new data to different output files. I placed some calls to the intrinsic subroutine CPU_TIME to see if gprof was being accurate:

PROGRAM main
    USE global_variables
    USE fileio, ONLY: read_old_restart, write_new_restart, output_slice, write_solution
    USE change_vars
    IMPLICIT NONE
    REAL(dp) :: t0, t1

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL allocate_data
    CALL CPU_TIME(t1)
    PRINT*, "Allocate data =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL build_grid
    CALL CPU_TIME(t1)
    PRINT*, "Build grid    =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL read_old_restart
    CALL CPU_TIME(t1)
    PRINT*, "Read restart  =", t1 - t0


    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL regroup_all
    CALL CPU_TIME(t1)
    PRINT*, "Regroup all   =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL redistribute_all
    CALL CPU_TIME(t1)
    PRINT*, "Redistribute  =", t1 - t0

    !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CALL CPU_TIME(t0)
    CALL write_new_restart
    CALL CPU_TIME(t1)
    PRINT*, "Write restart =", t1 - t0
END PROGRAM main

Here is the output:

 Allocate data =  1.000000000000000E-003
 Build grid    =  0.000000000000000E+000
 Read restart  =   10.7963590000000
 Regroup all   =   6.65998700000000
 Redistribute  =   14.3518180000000
 Write restart =   53.5218640000000

Therefore, the write_new_restart subroutine is the most time consuming and takes about 62% of the total run time. However according to grof, the subroutine redistribute_vars, which is called multiple times by redistribute_all is the most time consuming with 70% of the total time. Here is the output from gprof:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 74.40      8.95     8.95       61     0.15     0.15  change_vars_mp_redistribute_vars_
 19.12     11.25     2.30       60     0.04     0.04  change_vars_mp_regroup_vars_
  6.23     12.00     0.75       63     0.01     0.01  change_vars_mp_fill_last_blocks_
  0.08     12.01     0.01        1     0.01     2.31  change_vars_mp_regroup_all_
  0.08     12.02     0.01                             __intel_ssse3_rep_memcpy
  0.08     12.03     0.01                             for_open
  0.00     12.03     0.00        1     0.00    12.01  MAIN__
  0.00     12.03     0.00        1     0.00     0.00  change_vars_mp_build_grid_
  0.00     12.03     0.00        1     0.00     9.70  change_vars_mp_redistribute_all_
  0.00     12.03     0.00        1     0.00     0.00  fileio_mp_read_old_restart_
  0.00     12.03     0.00        1     0.00     0.00  fileio_mp_write_new_restart_
  0.00     12.03     0.00        1     0.00     0.00  global_variables_mp_allocate_data_


index % time    self  children    called     name
                0.00   12.01       1/1           main [2]
[1]     99.8    0.00   12.01       1         MAIN__ [1]
                0.00    9.70       1/1           change_vars_mp_redistribute_all_ [3]
                0.01    2.30       1/1           change_vars_mp_regroup_all_ [5]
                0.00    0.00       1/1           global_variables_mp_allocate_data_ [13]
                0.00    0.00       1/1           change_vars_mp_build_grid_ [10]
                0.00    0.00       1/1           fileio_mp_read_old_restart_ [11]
                0.00    0.00       1/1           fileio_mp_write_new_restart_ [12]
-----------------------------------------------
                                                 <spontaneous>
[2]     99.8    0.00   12.01                 main [2]
                0.00   12.01       1/1           MAIN__ [1]
-----------------------------------------------
                0.00    9.70       1/1           MAIN__ [1]
[3]     80.6    0.00    9.70       1         change_vars_mp_redistribute_all_ [3]
                8.95    0.00      61/61          change_vars_mp_redistribute_vars_ [4]
                0.75    0.00      63/63          change_vars_mp_fill_last_blocks_ [7]
-----------------------------------------------
                8.95    0.00      61/61          change_vars_mp_redistribute_all_ [3]
[4]     74.4    8.95    0.00      61         change_vars_mp_redistribute_vars_ [4]
-----------------------------------------------
                0.01    2.30       1/1           MAIN__ [1]
[5]     19.2    0.01    2.30       1         change_vars_mp_regroup_all_ [5]
                2.30    0.00      60/60          change_vars_mp_regroup_vars_ [6]
-----------------------------------------------
                2.30    0.00      60/60          change_vars_mp_regroup_all_ [5]
[6]     19.1    2.30    0.00      60         change_vars_mp_regroup_vars_ [6]
-----------------------------------------------
                0.75    0.00      63/63          change_vars_mp_redistribute_all_ [3]
[7]      6.2    0.75    0.00      63         change_vars_mp_fill_last_blocks_ [7]
-----------------------------------------------
                                                 <spontaneous>
[8]      0.1    0.01    0.00                 for_open [8]
-----------------------------------------------
                                                 <spontaneous>
[9]      0.1    0.01    0.00                 __intel_ssse3_rep_memcpy [9]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[10]     0.0    0.00    0.00       1         change_vars_mp_build_grid_ [10]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[11]     0.0    0.00    0.00       1         fileio_mp_read_old_restart_ [11]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[12]     0.0    0.00    0.00       1         fileio_mp_write_new_restart_ [12]
-----------------------------------------------
                0.00    0.00       1/1           MAIN__ [1]
[13]     0.0    0.00    0.00       1         global_variables_mp_allocate_data_ [13]
-----------------------------------------------

For your information, regroup_all calls regroup_vars multiple times and redistribute_all calls redistribute_vars and fill_last_blocks multiple times.

I am compiling my code with ifort with the -openmp -O2 -pg options.

QUESTION:

Why is gprof not seeing the time my file i/o subroutines take? (read_old_restart, write_new_restart)

Upvotes: 1

Views: 520

Answers (1)

Mike Dunlavey
Mike Dunlavey

Reputation: 40649

gprof specifically does not include I/O time. It only tries to measure CPU time.

That's because it only does two things: 1) sample the program counter on a 1/100 second clock, and the program counter is meaningless during I/O, and 2) count the number of times any function B is called by any function A.

From the call-counts, it tries to guess how much of each function's CPU time can be attributed to each caller. That's it's whole advance over pre-existing profilers.

When you use gprof, you should understand what it does and what it doesn't do.

Upvotes: 1

Related Questions