Bruno Brunolav
Bruno Brunolav

Reputation: 33

Cuda matrix multiplication- wrong result

this is my code for matrix multiplication, but when i run it i get correct result for first row but wrong ones for second and third(mostly big negative numbers). This is my first programm so i used some code that i found on net

 #include <iostream>

__global__ void MnozenjeMatrica(int* d_c, int* d_a, int* d_b)
{
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;    

int d = 0;
for(int i=0; i<3; i++)
{
    int x = d_a[row * 3 + i];
    int y = d_b[i * 3 + col];
    d += x * y;
}

d_c[row * 3 + col] = d; 
}

int main()
{
const int SIZE = 9 * sizeof(int); 

int a[3][3] = {{2, 4, 6}, {1, 3, 5}, {8, 4, 1}};
int b[3][3] = {{5, 8, 34}, {5, 7, 5}, {1, 4, 31}};
int c[3][3] = {{5, 8, 34}, {5, 7, 5}, {1, 4, 31}};

int* d_a;
int* d_b;
int* d_c;

cudaMalloc((void**) &d_a, SIZE);
cudaMalloc((void**) &d_b, SIZE);
cudaMalloc((void**) &d_c, SIZE);

cudaMemcpy(d_a, a, SIZE, cudaMemcpyHostToDevice);
cudaMemcpy(d_b, b, SIZE, cudaMemcpyHostToDevice);

MnozenjeMatrica<<<3, 3>>>(d_c, d_a, d_b);
cudaMemcpy(c, d_c, SIZE, cudaMemcpyDeviceToHost);

for(int i=0; i<3; i++)
{
    for(int j=0;  j<3; j++)
    {
        printf("%d\t", c[i][j]);
    }
    printf("\n");
}


 }

Upvotes: 0

Views: 372

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 152173

Completely agree with @talonmies.

More suggestions:

  • There are plenty of people who have posted questions about cuda matrix multiplication, you might take a look at some of those to get some ideas.
  • You're not doing any cuda error checking on kernel calls and cuda calls (but it's recommended)
  • You might try running your code with cuda-memcheck, and see what it says.
  • You could debug this kernel pretty quickly with a few choice printf statements. This is mostly C code after all, you should consider using basic C troubleshooting techniques.

Since I was able to quickly spot this, I can tell you that your kernel is depending on a 2-D threadblock structure to do anything useful:

int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;

But you are launching a 1D grid of 1D threadblocks:

MnozenjeMatrica<<<3, 3>>>(d_c, d_a, d_b);
                  ^  ^
                  |  1-D threadblock (3 threads)
                  1-D grid (3 blocks)

So I'm not surprised it only works for a single row.

Upvotes: 2

Related Questions