Reputation: 325
i expected to see numbers from 0.0 to 999.0 but instead getting some very weird and long number for some of the indices for the below code:
__global__ void kernel(double *res, int N)
{
int i = (gridDim.y*blockIdx.y+
blockIdx.x)*blockDim.x*blockDim.y+
blockDim.y*threadIdx.y+threadIdx.x;
if(i<N) res[i] = i;
}
void callGPU(int N)
{
dim3 dimBlock(8, 8);
dim3 dimGrid(2, 8);
...
kernel<<<dimGrid, dimBlock>>>(res, N);
...
}
even if i change the dimGrid to (8,2) and (1,16), but if I change the gridDim to (16,1) then i am getting the indices right. plz can you show how to correctly compute the gridDim for this case? if possible to arbitrary N. many thanks!
Upvotes: 0
Views: 1121
Reputation: 1839
Your indexing pattern is wrong.
Firstly, You should compute index by x
and y
dimensions.
int i_x = blockIdx.x * blockDim.x + threadIdx.x;
int i_y = blockIdx.y * blockDim.y + threadIdx.y;
Then you should compute pitch as count of whole threads by x
dimension
int pitch = gridDim.x * blockDim.x;
Finally, You can compute your 1D index from 2D grid.
int i = i_y * pitch + i_x;
Upvotes: 1