Get original matrix indices within a CUDA device

Question

I am passing a vectorized representation of a 2D square matrix to a CUDA device. I have found online how to perform matrix multiplication with two matrices on this format on a CUDA device.

However, I now need to obtain the original indices of my matrix before the device.

This is my code to pass to my cuda_kernel

#define MATRIX_SIZE 20
#define BLOCK_SIZE 2
#define TILE_SIZE  2

void cuda_stuff(int sz, double **A)
{
  double* A1d = matrix_to_vector(sz, A);
  double* d_A
  size_t sizeA = sz * sz * sizeof(double);
  cudaMalloc(&d_A, sizeA);
  cudaMemcpy(d_A, A1d, sizeA, cudaMemcpyHostToDevice);
  dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
  dim3 grid(MATRIX_SIZE / threads.x, MATRIX_SIZE / threads.y);
  cudakernel<<>>(sz, d_A);
}

This is my cudakernel

__global__ void cudakernel(int sz, double* A_d);
{
  int tx = blockIdx.x * TILE_SIZE + threadIdx.x;
  int ty = blockIdx.y * TILE_SIZE + threadIdx.y;

  /* Need to get original i, j from my matrix double* A */
}

How can I get the original indices [i][j] of my matrix double* A?

Robert Crovella · Accepted Answer

Your code will only work properly if MATRIX_SIZE is evenly divisible by BLOCK_SIZE (and BLOCK_SIZE must be the same as TILE_SIZE). This code appears to be set up to handle square matrices only, so I am assuming your original A matrix is of size (MATRIX_SIZE, MATRIX_SIZE).

Given that proviso, the following should retrieve the original element A corresponding to a given thread:

double my_A_element  = A_d[ty*MATRIX_SIZE+tx];

if you prefer, (again, given the above proviso) you can use the built-in variables:

double my_A_element  = A_d[ty*(blockDim.x*gridDim.x)+tx];

or, equivalently:

double my_A_element  = A_d[ty*sz+tx];

Regarding the indices, the tx variable is properly defined to give you the original column index into A, and the ty variable is properly defined to give you the original row index into A, for the above defined my_A_element variables.

Therefore the original element of A (corresponding to my_A_element) is just A[ty][tx]

Get original matrix indices within a CUDA device

Answers (1)

Related Questions