Cuda matrix addition

Question

I have written the following code to sum two 4x4 matrices in cuda.

#include
#include
#include

__global__ void Matrix_add(double* a, double* b, double* c,int n)
{
   int row = blockIdx.x * blockDim.x + threadIdx.x;
   int col = blockIdx.y * blockDim.y + threadIdx.y;
   int index = row * n + col;
   if(col>>(d_a,d_b,d_c,n);
cudaMemcpy(h_c+n,d_c,size,cudaMemcpyDeviceToHost);

for(i=0;i



Result of this addition should be a 2x2 all-ones matrix but in the result all the elements of matrix are 0. Also I get this message after getting result:  


  Segmentation fault (core dumped)


Can anyone please help me to find out the problem. 

Thank you

Troels Henriksen · Accepted Answer

Your host arrays (h_a, h_b, h_c) are not contiguous in memory, so your initial cudaMemcpy() calls will read garbage into GPU memory (apparently zeros in your case).

The reason is that your hosts arrays are not actually flat, but instead are represented as arrays of pointers. I guess to fake two-dimensional arrays in C? In any case, you either need to be more careful with your cudaMemcpy()s and copy the host arrays row-by-row, or use a flat representation on the host.

Cuda matrix addition

Answers (1)

Related Questions