azhew
azhew

Reputation: 45

kernel failure: invalid configuration argument

I have a question about my code and whether I can run it on my current device or not. Basically, I want to do a 3D interpolation. When I launch my interpolation kernel, I get the following error: kernel failure: invalid configuration argument

I saw in this discussion that it can happen if you call too many threads or blocks, but I am not sure it is the case in my code. Could someone have a look at it and tell me what's wrong?

Here is how I call my kernel:

dim3 blockSize(6,6,6);

dim3 threadSize(dimX/blockSize.x,dimY/blockSize.y,dimZ/blockSize.z);

d_interpolate_kernel<<<blockSize,threadSize>>>(output,dimX,dimY,dimZ);

My dimensions are dimX = 54 or 108, dimY=dimX=42 or 84. So I have blockSize(6,6,6) and threadSize(9,7,7) or (18,14,14).

My card has the following capabilities:

MAX_BLOCK_DIM_X = 512
MAX_BLOCK_DIM_Y = 512
MAX_BLOCK_DIM_Z = 64

MAX_GRID_DIM_X = 65535
MAX_GRID_DIM_Y = 65535
MAX_GRID_DIM_Z = 1

Do I get the error because MAX_GRID_DIM_Z is 1? If yes, is there a way around this?

Thank you!

Upvotes: 3

Views: 3533

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151879

One problem is you have your blockSize and threadSize variables reversed in your kernel call.

You want something like this:

d_interpolate_kernel<<<threadSize,blockSize>>>(output,dimX,dimY,dimZ);

The first configuration argument is the size of the grid in blocks.

The second configuration argument is the size of the block in threads.

Since you have them reversed, your (18,14,14) values are not acceptable block sizes (too many threads), since the max number of threads per block is 512 (for cc1.x) or 1024 (otherwise), whereas 18x14x14 = 3528.

For me, threadSize is a confusing name. I would have called it gridSize or something like that.

The second problem as you've pointed out is that for a cc1.x card (which seems to be what you have) your Z grid dimension must be 1. At least for your 42 case, you can fix this by re-structuring the thread blocks to have a dimension of, say, (2,2,42) and your grid a dimension of, say, (27, 21, 1).

Otherwise, these indices are just arbitrary numbering schemes. You can come up with a 2D grid that covers all of your 3D volume, using a (6, 6, 6) block size if that is what you want. You just need to get creative about how you map the blockIdx.x and blockIdx.y built-in variables in your interpolation kernel, to simulate a 3D grid.

Upvotes: 3

Related Questions