Reputation: 31
Profiling CUDA programs with nvprof.
I have discribed the problem in How to collect the event value every time the kernel function been invocated?
I post the problem again.
With nvprof --events tex0_cache_sector_queries --replay-mode kernel ./matrixMul
,
or nvprof --events tex0_cache_sector_queries --replay-mode application ./matrixMul
,
that we can collect the event values result:
==40013== Profiling application: ./matrixMul
==40013== Profiling result:
==40013== Event result:
"Device","Kernel","Invocations","Event Name","Min","Max","Avg","Total"
"Tesla K80 (0)","void matrixMulCUDA<int=32>(float*, float*, float*, int, int)",301,"tex0_cache_sector_queries",0,30,24,7224
Above result is a summary. The 301 times invocation value of tex0_cache_sector_queries invocated by kernel function matrixMulCUDA. It just has the min, max, avg, total value of the 301 times invocation, that is a summary result.
I want to collect the complete 301 times tex0_cache_sector_queries values which from every time the matrixMulCUDA been invoked. On the other hand, every time the kernel function matrixMulCUDA been invoked, I want to collect the tex0_cache_sector_queries event value. How to collect it?
Upvotes: 0
Views: 335
Reputation: 31
1 run with:
nvprof --pc-sampling-period 31 --print-gpu-trace --replay-mode application \
--export-profile application.prof --events tex0_cache_sector_queries ./matrixMul
2 import the application.prof into visual profiler:
3 follow the index on the picture that get the every invocation of event value of each kernel function.
4 the --print-gpu-trace
parameter: Print individual kernel invocations (including CUDA memcpy's/memset's) and sort them in chronological order. In event/metric profiling mode, show events/metrics for each kernel invocation can fix this problem. print-gpu-trace
Upvotes: 1