Reputation: 895

cuda thread indexing

is this right expression for matrix indexing (dim3 threadBlock = (A, B, 1), dim3 blockGrid = (C, D, 1), where A, B, C, D are some numbers) ?

int i = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;
int j = (blockIdx.x * gridDim.y + blockIdx.y) * blockDim.y + threadIdx.y;

Upvotes: 2

Answers (2)

talonmies

Reputation: 72348

That doesn't look correct to my eyes. The (i,j) indices of any thread in a two dimensional CUDA grid are

int idx_i = blockIdx.x * blockDim.x + threadIdx.x;
int idx_j = blockIdx.y * blockDim.y + threadIdx.y;

If you are accessing an array stored in linear memory the equivalent (i,j) index is either

int mindex_colmajor = idx_i + idx_j * LDA;

int mindex_rowmajor = idx_j + idy_i * LDA;

depending on whether the array is stored in row major or column major order with a first dimension in memory (or equivalently pitch) equal to LDA. You then access the memory as

value = array[mindex]

where mindex is either the column major or row major index calculated above.

Upvotes: 5

Ashwin Nanjappa

Reputation: 78498

That is one way to index into the matrix from threads, but not the only way. For example, I could change the kernel so that each thread deals with more than one matrix item.

Upvotes: 1

cuda thread indexing

Answers (2)

Related Questions