Reputation: 31
I have completed writing my CUDA kernel, and confirmed it runs as expected when I compile it using nvcc directly, by:
Yet, the results printed into the terminal while the application is getting profiled using Nsight Compute differs from run to run. I am curious if the difference is a cause for concern, or if this is the expected behavior.
Note: The application also gives correct & consistent results while getting profiled bu nvprof.
Upvotes: 0
Views: 763
Reputation: 31
I was able to resolve the issue by addressing my shared memory initializations. Since Nsight Compute runs a kernel multiple times as @Jackson stated, the effects of uninitialized memory were amplified (I was performing atomicAdd into uninitialized memory).
Upvotes: 2
Reputation: 21
I followed up on the NVIDIA forums but will post here as well for tracking:
What inconsistencies are you seeing in the output? Nsight Compute runs a kernel multiple times to collect all of its information. So things like print statements in the kernel will show up multiple times. Could it be related to that or is it a value being calculated differently? One other issue is with Unified Memory (UVM) or zero copy memory Nsight Compute is not able to restore those values before each replay. Are you using that in your application? If so, the application replay mode could help. It may be worth trying to see if anything changes.
Upvotes: 2