starter
starter

Reputation: 325

correctly computing gridDim for CUDA kernel

i expected to see numbers from 0.0 to 999.0 but instead getting some very weird and long number for some of the indices for the below code:

__global__ void kernel(double *res, int N)
{
  int i = (gridDim.y*blockIdx.y+
           blockIdx.x)*blockDim.x*blockDim.y+
           blockDim.y*threadIdx.y+threadIdx.x;
  if(i<N) res[i] = i;  
}

void callGPU(int N)
{
 dim3 dimBlock(8, 8);
 dim3 dimGrid(2, 8);
 ...
 kernel<<<dimGrid, dimBlock>>>(res, N);
 ...
} 

even if i change the dimGrid to (8,2) and (1,16), but if I change the gridDim to (16,1) then i am getting the indices right. plz can you show how to correctly compute the gridDim for this case? if possible to arbitrary N. many thanks!

Upvotes: 0

Views: 1121

Answers (1)

geek
geek

Reputation: 1839

Your indexing pattern is wrong. Firstly, You should compute index by x and y dimensions.

int i_x = blockIdx.x * blockDim.x + threadIdx.x;
int i_y = blockIdx.y * blockDim.y + threadIdx.y;

Then you should compute pitch as count of whole threads by x dimension

int pitch = gridDim.x * blockDim.x;

Finally, You can compute your 1D index from 2D grid.

int i = i_y * pitch + i_x;

Upvotes: 1

Related Questions