Serris Filippos
Serris Filippos

Reputation: 39

How to allocate and copy 2D arrays between host and device in CUDA correctly

Is it possible to copy a 2D host array allocated like that

h_A=(int**)malloc(N*sizeof(int*));

for(i=0;i<N;i++)
{
   h_A[i]=(int*)malloc(3*sizeof(int));
}

to 2D device array allocated like that

cudaMallocPitch((void**)&d_A, &pitch, 3*sizeof(int), N);

I've tried to copy from host to device and back to host to check if the process worked and the result was that only the 2 first rows copied correctly

https://drive.google.com/file/d/1gXpChyYd2Div0pDjTRxZhwYd7GHRfjXN/view?usp=sharing

Copy from host array h_A to device array d_A

cudaMemcpy2D(d_A, pitch, h_A, 3*sizeof(int), 3*sizeof(int), N, cudaMemcpyHostToDevice);

Copy from device array d_A to host array d_B

cudaMemcpy2D(h_B, pitch, d_A, 3*sizeof(int), 3*sizeof(int), N, cudaMemcpyDeviceToHost);

Upvotes: 0

Views: 151

Answers (1)

talonmies
talonmies

Reputation: 72350

If you allocate an array of pointers to store rows, like this:

h_A=(int**)malloc(N*sizeof(int*));

for(i=0;i<N;i++)
{
   h_A[i]=(int*)malloc(3*sizeof(int));
}

then to allocate and move that to a comparable device side structure using conventional device memory requires this:

dh_A=(int**)malloc(N*sizeof(int*));

for(i=0;i<N;i++)
{
   int* p; 
   cudaMalloc(&p, 3*sizeof(int))
   cudaMemcpy(p, h_A[i], 3*sizeof(int), cudaMemcpyHostToDevice);
   dh_A[i]=p;
}

int** d_A = cudaMalloc(&d_A, sizeof(int*) * N);
cudaMemcpy(d_A, dh_A, N*sizeof(int*), cudaMemcpyHostToDevice);

[Note: all code written in browser, not guaranteed to compile or work correctly]

I will leave it as an exercise to the reader how to perform the device to host copy. At this point you might conclude that it is simpler to just use linear memory on both the host and device. It will be simpler and faster.

Upvotes: 1

Related Questions