Reputation: 930
I have a problem when kernel launches. I launch a kernel with a grid size of (3000000, 16), and CUDA reports an "invalid argument" runtime error here. I tried different maxPixelCount
value and found: when maxPixelCount
is 200000, the error is reported, while when it's 50000, it continues without error.
dim3 dimGrid(maxPixelCount, imageCount);
printf("grid: %d * %d * %d", dimGrid.x, dimGrid.y, dimGrid.z);
mcudaGetGrayDataKernel <<< dimGrid, 1 >>> (deviceDestDataPtrs, deviceImageDataPtrs, deviceSizes);
cudaStatus = cudaGetLastError();
if (cudaStatus != cudaSuccess) {
printf("cuda start kernel error\n%s", cudaGetErrorString(cudaStatus);
goto Error;
}
I checked the max grid size to ensure my card's ability, using the following sentence:
printf(" - max grid size: %d * %d * %d\n",
prop.maxGridSize[0],
prop.maxGridSize[1],
prop.maxGridSize[2]);
I got the following message:
- max grid size: 2147483647 * 65535 * 65535
I think this means my dim is in the proper range. But why does the error appears?
My IDE is Visual Studio 2013
This problem has been solved. To reach the max limit of grid size, the Device
->Code Generation
option has to be set to the proper version. For my GPU I modified it to compute_30,sm_30
.
Upvotes: 1
Views: 3319
Reputation: 151799
This formulation:
dim3 dimGrid(maxPixelCount, imageCount);
places maxPixelCount
in the .x dimension of the variable (dimGrid
) that will be used to specify the grid dimensions of the kernel launch:
mcudaGetGrayDataKernel <<< dimGrid, 1 >>> ...
By referring to the programming guide (or you can use the deviceQuery
sample code, or query the data yourself programmatically), we can see that devices of compute capability 2.0 only support up to a 65535 limit on the .x dimension of the grid. In order to achieve the larger dimension (2^31 - 1) available in a compute capability 3.0 (or higher) device, it's necessary to:
<and>
There are various methods to specify how to compile for a compute capability 3.0 device. Most of the CUDA sample code projects demonstrate this for windows and linux (Makefile) projects. For more information on how to compile for a given device architecture, and what the various switches mean, refer to this answer and this answer and the relevant section of the nvcc manual.
Upvotes: 3