Cuda multiplication of an m-by-n matrix in an n-by-1 vector

Question

The following kernel multiplies two n-by-n matrices:

    __global__ void matrixMultiplication(const double *A, const double *B, double *C, int N)
{
    int i = blockDim.y * blockIdx.y + threadIdx.y;
    int j = blockDim.x * blockIdx.x + threadIdx.x;
    double value = 0;
    for(int k = 0; k < N; k++){
    value += A[k * N + j] * B[i * N + k];
    }
    C[i * N + j] = value;
    }

I use the above kernel in MATLAB like this:

k = parallel.gpu.CUDAKernel('matrixMultiplication.ptx', 'matrixMultiplication.cu');
A = rand(3,4);
b = rand(4,1);
C = zeros(3,1);
k.ThreadBlockSize = [3,4,1];
k.GridSize = [1, 1];
D = A*b;
C = feval(k,A,b,C,4);
D-C

but the result is not zero! How can I change this kernel so that I can multiply an m-by-n matrix in an n-by-1 vector?

Cuda multiplication of an m-by-n matrix in an n-by-1 vector

Answers (1)

Related Questions