Reputation: 21
I'm attempting to copy a 2-dimensional array from host to device with cudaMallocPitch and cudaMemcpy2D, but I'm having a problem where it seems to be setting my value to 0.
I'll write the basics of my code in the browser. I know the value I print from the kernel is not 0. Any ideas?
__global__ void kernel(float **d_array) {
printf("%f", d_array[0][0]);
}
void kernelWrapper(int rows, int cols, float **array) {
float **d_array;
size_t pitch;
cudaMallocPitch((void**) &d_array, &pitch, rows*sizeof(float), cols);
cudaMemcpy2D(d_array, pitch, array, rows*sizeof(float), rows*sizeof(float), cols, cudaMemcpyHostToDevice);
kernel<<<1,1>>>(d_array);
}
For some reason, the kernel keeps printing 0.0000. I know that the first element is not 0 as I tested printing the first element of the host array. What is happening?
EDIT: I tried this code as well but got invalid pointer errors.
cudaMalloc(d_array, rows*sizeof(float*));
for (int i = 0; i < rows; i++) {
cudaMalloc((void**) &d_array[i], cols*sizeof(float));
}
cudaMemcpy(d_array, array, rows*sizeof(float*), cudaMemcpyHostToDevice);
Upvotes: 1
Views: 1734
Reputation: 151879
Despite it's name, cudaMemcpy2D does not copy a doubly-subscripted C host array (**
) to a doubly-subscripted (**
) device array. You'll note that it expects single pointers (*
) to be passed to it, not double pointers (**
). cudaMemcpy2D
is used for copying a flat, strided array, not a 2-dimensional array. There are 2 dimensions inherent in the concept of strided access, which is where the name comes from.
In general, trying to copy a 2D array from host to device is more complicated than just a single API call. You are advised to flatten your array so you can reference it with a single pointer (*
), then the API calls will work. There are plenty of examples of proper usage of cudaMemcpy2D on SO, just search for them.
Also, you should do cuda error checking on all cuda API calls and kernel calls, whenever you are having difficulty with CUDA code.
If you really want to copy a 2D array directly, take a look at this question/answer for a worked example. It's not trivial.
Upvotes: 3