Reputation: 336
I have a 2D array dev_histogram stored in GPU and a 2D array histogarm stored in CPU. I want to copy content of dev_histogram into histogram. Below are relevant bits of my program. I can post full code as well.
int *dev_histogram; // Array for histogram, GPU
int histogram[SIZE_THETA][SIZE_RHO]; // Array for histogram, CPU
size_t pitch;
histogramSize = sizeof(int) * SIZE_THETA * SIZE_RHO;
cudaMallocPitch((void**)&dev_histogram, &pitch, SIZE_THETA * sizeof(int), SIZE_RHO)
houghTransformation << <width, height >> >(dev_edges, dev_histogram, pitch, n_pixels, width, height);
// Here I get a Segmentation fault:
cudaMemcpy2D(histogram, pitch, dev_histogram, SIZE_THETA * sizeof(int), SIZE_THETA * sizeof(int), SIZE_RHO * sizeof(int), cudaMemcpyDeviceToHost)
Could you please help me understand how to copy my matrix back? Mostly, I am confused with what to put as pitch for my source.
Upvotes: 1
Views: 1126
Reputation: 2053
In the CUDA toolkit reference manual you can see that the pitch in the cudaMallocPitch is the allocated width in bytes for the 2D array you are copying. Your dev_histogram will have an actual width equal to pitch and height equal to your specified height. Each line of your 2D array has pitch bytes allocated but only width*sizeof(int) bytes of valid data.
In the same document the prototype for cudaMemcpy2D is
cudaError_t cudaMemcpy2D (void ∗ dst, size_t dpitch, const void ∗ src, size_t spitch, size_t width, size_t height, enum cudaMemcpyKind kind)
here dst is your array on the host, dpitch is the width in bytes of the destination array (histogram) and spitch is the width in bytes of the source array (dev_histogram). width and height are the dimensions of your 2D array. You must call it like this then:
cudaMemcpy2D(histogram, SIZE_THETA*sizeof(int), dev_histogram, pitch, SIZE_THETA * sizeof(int), SIZE_RHO, cudaMemcpyDeviceToHost);
Edit: after ArchaeaSoftware I noticed that indeed the height is really number of rows, height in number of bytes doesn't make sense. Updated answer because you still need to change the pitches.
Upvotes: 1
Reputation: 4422
Specify SIZE_RHO as the height, not SIZE_RHO * sizeof(int):
<cudaMemcpy2D(histogram, pitch, dev_histogram, SIZE_THETA * sizeof(int), SIZE_THETA * sizeof(int), SIZE_RHO * sizeof(int), cudaMemcpyDeviceToHost);
>cudaMemcpy2D(histogram, pitch, dev_histogram, SIZE_THETA * sizeof(int), SIZE_THETA * sizeof(int), SIZE_RHO, cudaMemcpyDeviceToHost);
Upvotes: 1
Reputation: 3686
Often when storing data in contiguous memory you want to make a section of memory have a dimension that is a multiple of a storage unit so that data can be read efficiently. For example, rather than reading 4 individual bytes in a row you might read one 32 bit word. You do it for efficiency. Look up memory alignment.
For the same reason you want to make certain arrays have the size of pitch*height where pitch is the width rounded up to the nearest multiple of whatever storage unit you are using. If your array is 31*5 then you use a pitch of 32 but a width of 31. Four 32 bit reads are expected to be faster than thirty one 1 byte reads. You discard the extra "padding" byte.
You probably want to set pitch = width. The reason for your seg fault is that you haven't initialised it. Check that width and height are compatible with your GPU specifications for thread block size.
Upvotes: 0