Reputation: 365
I'm trying to debug a source code that works fine and gives no errors or warnings when compiling. The problem is that when I run it with cuda-gdb step by step, no CUDA kernels are launched at all (the output I get from the debugger is totally different from the one shown in the Nvidia cuda-gdb guide), but the program still works without any errors. At all times I get No CUDA kernels, devices or threads. Apparently Focus is not set on anything too. I'm using the 4.2 release of CUDA-GDB.
This is what I get from the debugger when it should launch the kernel:
Breakpoint 1, matrixMulGPU (M=0x609160, N=0x609270, P=0x609490, Width=8)
at matrixMul1.cu:141
141 MatrixMulKernel<<<dimGrid, dimBlock>>>(Md, Nd, Pd, Width);
(cuda-gdb) step
MatrixMulKernel (__cuda_0=0x210000, __cuda_1=0x210100, __cuda_2=0x210200,
__cuda_3=8) at matrixMul1.cu:103
103 __global__ void MatrixMulKernel(float *Md, float *Nd, float *Pd, int Width){
(cuda-gdb) step
__device_stub__Z15MatrixMulKernelPfS_S_i (__par0=0x210000, __par1=0x210100,
__par2=0x210200, __par3=8)
at tmpxft_000016d4_00000000-1_matrixMul1.cudafe1.stub.c:5
5 tmpxft_000016d4_00000000-1_matrixMul1.cudafe1.stub.c: Arquivo ou diretório não encontrado.
in tmpxft_000016d4_00000000-1_matrixMul1.cudafe1.stub.c
(cuda-gdb) step
cudaLaunch<char> (
entry=0x4011ea "UH\211\345SH\203\354(H\211}\350H\211u\340H\211U؉MԋM\324H\213U\330H\213]\340H\213E\350H\211\336H\211\307\350\024\377\377\377H\203\304([\311\303UH\211\345SH\203\354(\277Pn@") at cuda_runtime.h:958
958 return cudaLaunch((const char*)entry);
(cuda-gdb) step
959 }
(cuda-gdb) step
MatrixMulKernel (__cuda_0=0x210000, __cuda_1=0x210100, __cuda_2=0x210200,
__cuda_3=8) at matrixMul1.cu:121
121 }
My CUDA device is a GeForce 8400M GS and I had no problems with the deviceQuery check. I've no clue about how to solve this as the Nvidia forum is offline these days!
Thanks a lot in advance.
Upvotes: 0
Views: 1050
Reputation: 519
Looking at the cuda-gdb output, you seem to be on the host component of the kernel launch (i.e. the <<< >>>). CUDA kernel launches are asynchronous. The host call prepares the launch and will return before it has completed (or in some cases before the launched work has even started). As a result, while you are stopped on the host, the launched work may not have yet been dispatched to the GPU.
Stepping into the host side kernel launch call will not step onto the kernel launch on the device. Instead, try to set a break point inside the kernel itself, and let the app run freely. A breakpoint can be set by file:linenumber (e.g. break matrixMul1.cu:<line>
or by name (e.g. break MatrixMulKernel
). When the device side breakpoint is hit, cuda-gdb will return to the prompt and set focus on the device as appropriate.
Upvotes: 1