Reputation: 10258
I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I can do to see what is going on inside that function. This my current function:
__global__ void MatrixMulKernel(Matrix Ad, Matrix Bd, Matrix Xd){
int tx = threadIdx.x;
int ty = threadIdx.y;
int bx = blockIdx.x;
int by = blockIdx.y;
float sum = 0;
for( int k = 0; k < Ad.width ; ++k){
float Melement = Ad.elements[ty * Ad.width + k];
float Nelement = Bd.elements[k * Bd.width + tx];
sum += Melement * Nelement;
}
Xd.elements[ty * Xd.width + tx] = sum;
}
I would love to know if Ad and Bd is what I think it is, and see if that function is actually being called.
Upvotes: 40
Views: 94499
Reputation: 8640
CUDA now supports printf
s directly in the kernel.
NVIDIA's docs online, Formatted Output section.
Formatted output is only supported by devices of compute capability 2.x and higher.
int printf(const char *format[, arg, ...]);
For past versions' docs, see this page.
Upvotes: 84
Reputation: 21128
EDIT
To avoid misleading people, as M. Tibbits points out printf is available in any GPU of compute capability 2.0 and higher.
END OF EDIT
You have choices:
Regarding your code snippet:
Matrix
structs in via pointer (i.e. cudaMemcpy
them to the device, then pass in the device pointer), right now you will have no problem but if the function signature gets very large then you may hit the 256 byte limitUpvotes: 17
Reputation: 3846
See "Formatted output" (currently B.17) section of CUDA C Programming Guide.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Upvotes: 2
Reputation: 7608
by the way..
Upvotes: 4