CUDA copying array from device to host using cudaMemcpy2D

Question

cudaMemcpy2D doesn't copy that I expected. After I read the manual about cudaMallocPitch, I try to make some code to understand what's going on. But, well, I got a problem.

I made simple program like this:

int main()
{
    double *d_A;
    size_t d_pitch;

    cudaMallocPitch((void**)&d_A, &d_pitch, sizeof(double) * SIZE, SIZE);

    dim3 blocks(4, 4);
    dim3 threads(16, 16);

    doStuff<<>>(d_A, d_pitch);

    double *A;
    size_t pitch = sizeof(double) * SIZE;

    A = (double*)malloc(sizeof(double) * SIZE * SIZE);

    cudaMemcpy2D(A, pitch, d_A, d_pitch, sizeof(double) * SIZE, SIZE, cudaMemcpyDeviceToHost);

    for (int i = 0; i < SIZE; i++) {
        for (int j = 0; j < SIZE; j++) printf("%f ", A[sizeof(double) * i + j]);
        printf("
");
    }
}

and doStuff is:

__global__ void doStuff(double *d_A, size_t d_pitch)
{
    unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;
    unsigned int j = blockIdx.y * blockDim.y + threadIdx.y;
    double *target = ( (double*)(((char*)d_A) + (d_pitch * i)) ) + j;

    if (i < SIZE && j < SIZE)
        *target = (i + 1) * (j + 1) + 0.0;
}

So doStuff is same as d_A[i][j] = (i+1)*(j+1). If SIZE is 5, what I expected is:

1 2 3 4 5
2 4 6 8 10
3 6 9 12 15
4 8 12 16 20
5 10 15 20 25

in double precision. However, when I compile and run, I got:

1 2 3 4 5
8 10 3 6 9
8 12 16 20 5
25 0 0 0 0
0 0 0 0 0

It seems that for each row, cudaMemcpy2D overrides previous data. I try to find the problem changing pitch and widths, but I can't.

So what's going on my code?

Robert Crovella · Accepted Answer

The error is in this line:

    for (int j = 0; j < SIZE; j++) printf("%f ", A[sizeof(double) * i + j]);

It should be:

    for (int j = 0; j < SIZE; j++) printf("%f ", A[SIZE * i + j]);

You want to scale the row index (i) by the size of a row in elements (not the size of an element in bytes).

This has nothing to do with CUDA of course.

CUDA copying array from device to host using cudaMemcpy2D

Answers (1)

Related Questions