Reputation: 55
I'm trying to do a matrix multiplication between a 3x3 matrix and 360x360 matrix. The smaller matrix (3x3) is going to be manipulated with the first (3x3) block of the big matrix and so forth. Hence I want to have my smaller matrix constant and slide it over my bigger matrix.
Is it possible to store my smaller matrix as part of shared memory and have my bigger matrix divided into 3x3 in global?
I'm not finding a way to copy the smaller matrix to shared directly from host. Kindly do correct me if my visualization of cuda is wrong.
Thanks.
Upvotes: 0
Views: 561
Reputation: 326
It is not possible to populate shared memory from the host.
However, the best way to handle constants for all threads, such as the 3x3 matrix from your example, is to put them in constant memory (the size of which is 64 kB). There are 2 ways of using constant memory:
struct
containing your kernel arguments, including the 3x3 matrix, and pass it to your kernelUse __constant__
type qualifier and use cudaMemcpyToSymbol
to populate it from the host:
//In global scope
__constant__ float mat_gpu[3][3];
//In the function that you use to populate the data
cudaMemcpyToSymbol(mat_gpu, mat_cpu, 9 * sizeof(float));
//In your kernel you just use the mat_gpu variable
Upvotes: 1