Difference in time reported by NVVP and counters

Question

I've been running kernel of CUDA programs. I observe that there is considerable difference between time reported by GPU counters and NVVP for kernel execution. Why such difference is usually observed?

Greg Smith · Accepted Answer

Nsight Visual Studio Edition and the Visual Profiler support two mechanism for capturing the duration of the kernel. Both of these methods will result in a value smaller and more accurate than what is reported by CUevent/cudaEvent. The methods are as follows:

Concurrent Kernel Timing
This is the default mode used by Nsight 2.x and Visual Profiler 5.0 to generate a timeline. The duration of a kernel is defined as the time from when the kernel code starts executing on the device to the time that it completes. This cannot be measured using CUDA events.
Serialized Kernel Timing
This is the default mode used by tools when collecting PM counters for each kernel. The duration of a kernel is defined as the time the GPU processes the launch request until the GPU idles after completion of the kernel. This mode specifically disables concurrent kernel execution. In almost all cases the reported duration will be slightly larger than the concurrent kernel trace duration as it includes time for the GPU to launch the first block and time for the GPU to complete all memory stores.
CUDA Event Range Timing
CUDA event timing is done by calling cu/cudaEventRecord before and after the kernel launch on the same stream. Each event record inserts a command into the GPU push buffer. When the command reaches the GPU it writes a timestamp to memory. It is possible to push two event records without a launch. This allows a developer to measure the GPU time between the two timestamp commands. This method has the following disadvantages and it is why I encourage developers to use the tools (Nsight, Visual Profiler, and CUPTI):

The duration provide in each of these modes will provide different values. Furthermore the definition of duration provided by tools and those available through use of events is different.

The NVIDIA tools define duration as best as possible as the time from when the GPU starts working on the kernel to when the GPU completes work on the kernel. If a developer is interested in collecting this information they should look at the CUPTI SDK included with the toolkit.

Difference in time reported by NVVP and counters

Answers (1)

Related Questions