Reputation: 7316
I am passing a vectorized representation of a 2D square matrix
to a CUDA
device. I have found online how to perform matrix multiplication with two matrices on this format on a CUDA
device.
However, I now need to obtain the original indices of my matrix before the device.
This is my code to pass to my cuda_kernel
#define MATRIX_SIZE 20
#define BLOCK_SIZE 2
#define TILE_SIZE 2
void cuda_stuff(int sz, double **A)
{
double* A1d = matrix_to_vector(sz, A);
double* d_A
size_t sizeA = sz * sz * sizeof(double);
cudaMalloc(&d_A, sizeA);
cudaMemcpy(d_A, A1d, sizeA, cudaMemcpyHostToDevice);
dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
dim3 grid(MATRIX_SIZE / threads.x, MATRIX_SIZE / threads.y);
cudakernel<<<grid, threads>>>(sz, d_A);
}
This is my cudakernel
__global__ void cudakernel(int sz, double* A_d);
{
int tx = blockIdx.x * TILE_SIZE + threadIdx.x;
int ty = blockIdx.y * TILE_SIZE + threadIdx.y;
/* Need to get original i, j from my matrix double* A */
}
How can I get the original indices [i][j] of my matrix double* A
?
Upvotes: 0
Views: 72
Reputation: 151799
Your code will only work properly if MATRIX_SIZE
is evenly divisible by BLOCK_SIZE
(and BLOCK_SIZE
must be the same as TILE_SIZE
). This code appears to be set up to handle square matrices only, so I am assuming your original A
matrix is of size (MATRIX_SIZE
, MATRIX_SIZE
).
Given that proviso, the following should retrieve the original element A corresponding to a given thread:
double my_A_element = A_d[ty*MATRIX_SIZE+tx];
if you prefer, (again, given the above proviso) you can use the built-in variables:
double my_A_element = A_d[ty*(blockDim.x*gridDim.x)+tx];
or, equivalently:
double my_A_element = A_d[ty*sz+tx];
Regarding the indices, the tx
variable is properly defined to give you the original column index into A
, and the ty
variable is properly defined to give you the original row index into A
, for the above defined my_A_element
variables.
Therefore the original element of A
(corresponding to my_A_element
) is just A[ty][tx]
Upvotes: 2