CUDA kernel launched from Nsight Compute gives inconsistent results

Question

I have completed writing my CUDA kernel, and confirmed it runs as expected when I compile it using nvcc directly, by:

Validating with test data over 100 runs (just in case)
Using cuda-memcheck (memcheck, synccheck, racecheck, initcheck)

Yet, the results printed into the terminal while the application is getting profiled using Nsight Compute differs from run to run. I am curious if the difference is a cause for concern, or if this is the expected behavior.

Note: The application also gives correct & consistent results while getting profiled bu nvprof.

forever__newbie · Accepted Answer

I was able to resolve the issue by addressing my shared memory initializations. Since Nsight Compute runs a kernel multiple times as @Jackson stated, the effects of uninitialized memory were amplified (I was performing atomicAdd into uninitialized memory).

CUDA kernel launched from Nsight Compute gives inconsistent results

Answers (2)

Related Questions