Reputation: 45
I have a question about my code and whether I can run it on my current device or not. Basically, I want to do a 3D interpolation. When I launch my interpolation kernel, I get the following error: kernel failure: invalid configuration argument
I saw in this discussion that it can happen if you call too many threads or blocks, but I am not sure it is the case in my code. Could someone have a look at it and tell me what's wrong?
Here is how I call my kernel:
dim3 blockSize(6,6,6);
dim3 threadSize(dimX/blockSize.x,dimY/blockSize.y,dimZ/blockSize.z);
d_interpolate_kernel<<<blockSize,threadSize>>>(output,dimX,dimY,dimZ);
My dimensions are dimX = 54 or 108, dimY=dimX=42 or 84. So I have blockSize(6,6,6) and threadSize(9,7,7) or (18,14,14).
My card has the following capabilities:
MAX_BLOCK_DIM_X = 512
MAX_BLOCK_DIM_Y = 512
MAX_BLOCK_DIM_Z = 64
MAX_GRID_DIM_X = 65535
MAX_GRID_DIM_Y = 65535
MAX_GRID_DIM_Z = 1
Do I get the error because MAX_GRID_DIM_Z is 1? If yes, is there a way around this?
Thank you!
Upvotes: 3
Views: 3533
Reputation: 151879
One problem is you have your blockSize
and threadSize
variables reversed in your kernel call.
You want something like this:
d_interpolate_kernel<<<threadSize,blockSize>>>(output,dimX,dimY,dimZ);
The first configuration argument is the size of the grid in blocks.
The second configuration argument is the size of the block in threads.
Since you have them reversed, your (18,14,14) values are not acceptable block sizes (too many threads), since the max number of threads per block is 512 (for cc1.x) or 1024 (otherwise), whereas 18x14x14 = 3528.
For me, threadSize
is a confusing name. I would have called it gridSize
or something like that.
The second problem as you've pointed out is that for a cc1.x card (which seems to be what you have) your Z grid dimension must be 1. At least for your 42 case, you can fix this by re-structuring the thread blocks to have a dimension of, say, (2,2,42) and your grid a dimension of, say, (27, 21, 1).
Otherwise, these indices are just arbitrary numbering schemes. You can come up with a 2D grid that covers all of your 3D volume, using a (6, 6, 6) block size if that is what you want. You just need to get creative about how you map the blockIdx.x
and blockIdx.y
built-in variables in your interpolation kernel, to simulate a 3D grid.
Upvotes: 3