cuda: matrix multiplication using shared and global

Question

I'm trying to do a matrix multiplication between a 3x3 matrix and 360x360 matrix. The smaller matrix (3x3) is going to be manipulated with the first (3x3) block of the big matrix and so forth. Hence I want to have my smaller matrix constant and slide it over my bigger matrix.

Is it possible to store my smaller matrix as part of shared memory and have my bigger matrix divided into 3x3 in global?

I'm not finding a way to copy the smaller matrix to shared directly from host. Kindly do correct me if my visualization of cuda is wrong.

Thanks.

ptrendx · Accepted Answer

It is not possible to populate shared memory from the host.

However, the best way to handle constants for all threads, such as the 3x3 matrix from your example, is to put them in constant memory (the size of which is 64 kB). There are 2 ways of using constant memory:

The easiest way is to use kernel arguments. Define a struct containing your kernel arguments, including the 3x3 matrix, and pass it to your kernel

Use __constant__ type qualifier and use cudaMemcpyToSymbol to populate it from the host:

//In global scope
__constant__ float mat_gpu[3][3];
//In the function that you use to populate the data
cudaMemcpyToSymbol(mat_gpu, mat_cpu, 9 * sizeof(float));
//In your kernel you just use the mat_gpu variable

cuda: matrix multiplication using shared and global

Answers (1)

Related Questions