Reputation: 131415
I'm using the CUDA 7.0 profiler, nvprof
, to profile some process making CUDA calls:
$ nvprof -o out.nvprof /path/to/my/app
Later, I generate two traces: the 'API trace' (what happens on the host CPU, e.g. CUDA runtime calls and ranges you mark) and the 'GPU trace' (kernel executions, memsets, H2Ds, D2Hs and so on):
$ nvprof -i out.nvprof --print-api-trace --csv 2>&1 | tail -n +2 > api-trace.csv
$ nvprof -i out.nvprof --print-gpu-trace --csv 2>&1 | tail -n +2 > gpu-trace.csv
Every record in each of the traces has a timestamp (or a start and end time). The thing is, time value 0 in these two traces is not the same: The GPU trace time-0 point seems to signify when the first operation on the GPU triggered by the relevant process begins to execute, while the API trace's time-0 point seems to be the beginning of process execution, or sometime thereabouts.
I've also noticed that when I use nvvp
and import out.nvprof
, the values are corrected, that it to say, the start time of the first GPU op is not 0, but something more realistic.
How do I obtain the correct offset between the two traces?
Upvotes: 0
Views: 2144
Reputation: 151799
It may not be obvious from the nvprof
documentation, but it is possible to specify both --print-gpu-trace
and --print-api-trace
when requesting output from nvprof
, whether you are profiling an app or extracting information from a previously captured profiler output file.
If you are profiling an app, the following should generate a "harmonized" timeline for both API activity and GPU activity:
nvprof --print-gpu-trace --print-api-trace ./my_app
You can save the output using the --log-file
option.
Similarly, if you are extracting output from a previously captured output file (not the same thing as a log file), you can do something like the following:
nvprof -i profiler_out_file --print-gpu-trace --print-api-trace ...
where profiler_out_file
should be the name of the file you previously saved using the nvprof -o ...
option.
Printing both traces with the same command is essential here for the two (combined) timelines to begin at the same point in time; if you issue two commands, each printing another trace, they may not be thus 'harmonized'.
Upvotes: 3