Reputation: 1740
I have a Cuda kernel that runs well if I use the nsight cuda profiler or if I run it directly from the terminal. But if I use this command
cuda-memcheck --leak-check full ./CudaTT 1 ../../file.jpg
It crashes with "unspecified launch failure". I'm using this after each kernel code.
e=cudaDeviceSynchronize();
if (e != cudaSuccess) printf("Fail in kernel 2 %s",cudaGetErrorString(e));
and cuda-memcheck shows several of this
========= Program hit error 4 on CUDA API call to cudaDeviceSynchronize
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x24e129]
========= Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaDeviceSynchronize + 0x214) [0x27e24]
=========
========= Program hit error 4 on CUDA API call to cudaFree
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x24e129]
========= Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaFree + 0x228) [0x338b8]
in the end it shows
========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 10 errors
Any idea why this happens?
Edit:
I commented out another kernel which was not launching due to having many registers and now the error on the kernel above changed now it says: "the launch timed out and was terminated". Again it runs ok on the cuda profiler and without cuda-memcheck on the terminal but when using cuda-memcheck it shows this
========= Program hit error 6 on CUDA API call to cudaDeviceSynchronize
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x24e129]
========= Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaDeviceSynchronize + 0x214) [0x27e24]
=========
========= Program hit error 6 on CUDA API call to cudaFree
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/libcuda.so [0x24e129]
========= Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaFree + 0x228) [0x338b8]
========= Host Frame:[0xbf913ea8]
And the same 10 errors in the end
========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 10 errors
Error 6 appears to be due to a timeout of a kernel lasting too much time but how come it works without cuda-memcheck? On the profiler it shows the kernel lasts 3.771 seconds.
Another strange behavior is that I'm printing some values after the calculations. The values are different if I use cuda-memcheck than if I don't.
Upvotes: 1
Views: 7066
Reputation: 519
A better link would be http://docs.nvidia.com/cuda/cuda-memcheck/index.html. Cuda-memcheck can and does alter the run time of the application's CUDA kernels. If the GPU is being used for display, then a watchdog timeout is present that prevents the runtime of the kernel from exceeding a fixed boundary (on Linux, this is usually ~5 seconds). Given that the uninstrumented kernel takes 3.7 seconds, it is very likely that the modified version of the kernel being run by memcheck is actually exceeding the watchdog and hence the kernel launch is being timed out. There are a couple of options in such cases :
Option "Interactive" "off"
in /etc/X11/xorg.conf
. Note that in this mode, the display will not update while the CUDA kernel is running.Upvotes: 2
Reputation: 1740
It appears kernels launch much slower with cuda-memcheck
people.maths.ox.ac.uk/gilesm/cuda/doc/cuda-memcheck.pdf
Page 16
"Applications run much slower under CUDA‐MEMCHECK. This may cause some kernel launches to fail with a launch timeout error when running with CUDA‐ MEMCHECK enabled. "
Upvotes: 0