Reputation: 895
is this right expression for matrix indexing (dim3 threadBlock = (A, B, 1), dim3 blockGrid = (C, D, 1), where A, B, C, D are some numbers) ?
int i = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;
int j = (blockIdx.x * gridDim.y + blockIdx.y) * blockDim.y + threadIdx.y;
Upvotes: 2
Views: 3166
Reputation: 72348
That doesn't look correct to my eyes. The (i,j)
indices of any thread in a two dimensional CUDA grid are
int idx_i = blockIdx.x * blockDim.x + threadIdx.x;
int idx_j = blockIdx.y * blockDim.y + threadIdx.y;
If you are accessing an array stored in linear memory the equivalent (i,j)
index is either
int mindex_colmajor = idx_i + idx_j * LDA;
or
int mindex_rowmajor = idx_j + idy_i * LDA;
depending on whether the array is stored in row major or column major order with a first dimension in memory (or equivalently pitch) equal to LDA
. You then access the memory as
value = array[mindex]
where mindex
is either the column major or row major index calculated above.
Upvotes: 5
Reputation: 78498
That is one way to index into the matrix from threads, but not the only way. For example, I could change the kernel so that each thread deals with more than one matrix item.
Upvotes: 1