cuda: matrix multiplication using shared and global

I'm trying to do a matrix multiplication between a 3x3 matrix and 360x360 matrix. The smaller matrix (3x3) is going to be manipulated with the first (3x3) block of the big matrix and so forth. Hence I want to have my smaller matrix constant and slide it over my bigger matrix.

Is it possible to store my smaller matrix as part of shared memory and have my bigger matrix divided into 3x3 in global?

I'm not finding a way to copy the smaller matrix to shared directly from host. Kindly do correct me if my visualization of cuda is wrong.

Thanks.

Upvotes: 0

Views: 561

Answers (1)

ptrendx
ptrendx

Reputation: 326

It is not possible to populate shared memory from the host.

However, the best way to handle constants for all threads, such as the 3x3 matrix from your example, is to put them in constant memory (the size of which is 64 kB). There are 2 ways of using constant memory:

  • The easiest way is to use kernel arguments. Define a struct containing your kernel arguments, including the 3x3 matrix, and pass it to your kernel
  • Use __constant__ type qualifier and use cudaMemcpyToSymbol to populate it from the host:

    //In global scope
    __constant__ float mat_gpu[3][3];
    //In the function that you use to populate the data
    cudaMemcpyToSymbol(mat_gpu, mat_cpu, 9 * sizeof(float));
    //In your kernel you just use the mat_gpu variable
    

Upvotes: 1

Related Questions