Cuda kernel error if I use cuda-memcheck

Question

I have a Cuda kernel that runs well if I use the nsight cuda profiler or if I run it directly from the terminal. But if I use this command

cuda-memcheck --leak-check full ./CudaTT 1 ../../file.jpg

It crashes with "unspecified launch failure". I'm using this after each kernel code.

e=cudaDeviceSynchronize();

if (e != cudaSuccess) printf("Fail in kernel 2 %s",cudaGetErrorString(e));

and cuda-memcheck shows several of this

========= Program hit error 4 on CUDA API call to cudaDeviceSynchronize 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/libcuda.so [0x24e129]
=========     Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaDeviceSynchronize + 0x214) [0x27e24]
=========
========= Program hit error 4 on CUDA API call to cudaFree 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/libcuda.so [0x24e129]
=========     Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaFree + 0x228) [0x338b8]

in the end it shows

========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 10 errors

Any idea why this happens?

Edit:

I commented out another kernel which was not launching due to having many registers and now the error on the kernel above changed now it says: "the launch timed out and was terminated". Again it runs ok on the cuda profiler and without cuda-memcheck on the terminal but when using cuda-memcheck it shows this

========= Program hit error 6 on CUDA API call to cudaDeviceSynchronize 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/libcuda.so [0x24e129]
=========     Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaDeviceSynchronize + 0x214) [0x27e24]
=========
========= Program hit error 6 on CUDA API call to cudaFree 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/libcuda.so [0x24e129]
=========     Host Frame:/usr/local/cuda-5.0/lib/libcudart.so.5.0 (cudaFree + 0x228) [0x338b8]
=========     Host Frame:[0xbf913ea8]

And the same 10 errors in the end

========= LEAK SUMMARY: 0 bytes leaked in 0 allocations
========= ERROR SUMMARY: 10 errors

Error 6 appears to be due to a timeout of a kernel lasting too much time but how come it works without cuda-memcheck? On the profiler it shows the kernel lasts 3.771 seconds.

Another strange behavior is that I'm printing some values after the calculations. The values are different if I use cuda-memcheck than if I don't.

Vyas · Accepted Answer

A better link would be http://docs.nvidia.com/cuda/cuda-memcheck/index.html. Cuda-memcheck can and does alter the run time of the application's CUDA kernels. If the GPU is being used for display, then a watchdog timeout is present that prevents the runtime of the kernel from exceeding a fixed boundary (on Linux, this is usually ~5 seconds). Given that the uninstrumented kernel takes 3.7 seconds, it is very likely that the modified version of the kernel being run by memcheck is actually exceeding the watchdog and hence the kernel launch is being timed out. There are a couple of options in such cases :

Run on a system where X has not been started
Launch the X server in non interactive mode using Option "Interactive" "off" in /etc/X11/xorg.conf. Note that in this mode, the display will not update while the CUDA kernel is running.

Cuda kernel error if I use cuda-memcheck

Answers (2)

Related Questions