Reputation: 209
Is there a way to get the kernel execution time in nvprof like for a metric?
for example, to get the dram read transactions I type:
nvprof --metrics dram_read_transactions ./myprogram
My question is: is there something like
nvprof --metrics execution_time ./myprogram
I would like to collect a small set of metrics in one command line instead of having to use
nvprof ./myprogram
as a separate command.
Upvotes: 3
Views: 3139
Reputation: 132240
You should read this post on nVIDIA's "CUDA Pro Tip" blog:
CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler
It walks you through some basics of how to use nvprof
to profile and time your application. Specifically, if you write something like:
nvprof --print-gpu-trace ./nbody --benchmark -numdevices=2 -i=1
(the example is for an n-body physics problem simulator), your output will include something like the following:
...
==4125== Profiling application: ./nbody --benchmark -numdevices=2 -i=1
==4125== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput Device Context Stream Name
260.78ms 864ns - - - - - 4B 4.6296MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
260.79ms 960ns - - - - - 4B 4.1667MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
260.93ms 896ns - - - - - 4B 4.4643MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
260.94ms 672ns - - - - - 4B 5.9524MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
268.03ms 1.3120us - - - - - 8B 6.0976MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
268.04ms 928ns - - - - - 8B 8.6207MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
268.19ms 864ns - - - - - 8B 9.2593MB/s Tesla K20c (0) 2 2 [CUDA memcpy HtoD]
268.19ms 800ns - - - - - 8B 10.000MB/s GeForce GTX 680 1 2 [CUDA memcpy HtoD]
274.59ms 2.2887ms (52 1 1) (256 1 1) 36 0B 4.0960KB - - Tesla K20c (0) 2 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [242]
274.67ms 981.47us (32 1 1) (256 1 1) 36 0B 4.0960KB - - GeForce GTX 680 1 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [257]
276.94ms 2.3146ms (52 1 1) (256 1 1) 36 0B 4.0960KB - - Tesla K20c (0) 2 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [275]
276.99ms 979.36us (32 1 1) (256 1 1) 36 0B 4.0960KB - - GeForce GTX 680 1 2 void integrateBodies(vec4::Type*, vec4::Type*, vec4::Type*, unsigned int, unsigned int, float, float, int) [290]
which is a timing of all of your kernels.
It's also useful to run nvprof --help
and spend 5-10 minutes reading through the options; for example, you'll find the switch for printing your trace in CSV format if you want to process it in a script.
Upvotes: 1
Reputation: 763
I believe you are looking for: nvprof --print-gpu-trace ./myprogram
Upvotes: 2