sam
sam

Reputation: 159

OpenCV - Copy GpuMat into cuda device data

I am trying to copy the data in a cv::cuda::GpuMat to a uint8_t* variable which is to be used in a kernel.

The GpuMat contains an image data of resolution 752x480 and of type CV_8UC1. Below is the sample code:

uint8_t *imgPtr;
cv::Mat left, downloadedLeft;
cv::cuda::GpuMat gpuLeft;

left = imread("leftview.jpg", cv::IMREAD_GRAYSCALE);
gpuLeft.upload(left);

cudaMalloc((void **)&imgPtr, sizeof(uint8_t)*gpuLeft.rows*gpuLeft.cols);
cudaMemcpyAsync(imgPtr, gpuLeft.ptr<uint8_t>(), sizeof(uint8_t)*gpuLeft.rows*gpuLeft.cols, cudaMemcpyDeviceToDevice);

// following code is just for testing and visualization...
cv::cuda::GpuMat gpuImg(left.rows, left.cols, left.type(), imgPtr);
gpuImg.download(downloadedLeft);
imshow ("test", downloadedLeft);
waitKey(0);

But the output is not as expected. Following are the input and output image respectively.

INPUT Input Image

OUTPUT enter image description here

I have tried giving the cv::Mat source to the cudaMemcpy. It seems to be working fine. The issue seems to be with the cv::cuda::GpuMat and cudaMemcpy. A similar issue is discussed in the here

Also, if the image with is 256 or 512, the program seems to be working fine.

What is that I am missing? What should be done for the 752x480 image to work properly?

Upvotes: 2

Views: 9525

Answers (1)

talonmies
talonmies

Reputation: 72349

OpenCV GpuMat uses strided storage (so the image is not stored contiguously in memory). In short, your example fails for most cases because

  1. You don't copy the whole image to the CUDA memory allocation, and
  2. You don't correctly specify the memory layout when you create the second GpuMat instance from the GPU pointer.

By my reading of the documentation, you probably want something like this:

uint8_t *imgPtr;
cv::Mat left, downloadedLeft;
cv::cuda::GpuMat gpuLeft;

left = imread("leftview.jpg", cv::IMREAD_GRAYSCALE);
gpuLeft.upload(left);

cudaMalloc((void **)&imgPtr, gpuLeft.rows*gpuLeft.step);
cudaMemcpyAsync(imgPtr, gpuLeft.ptr<uint8_t>(), gpuLeft.rows*gpuLeft.step, cudaMemcpyDeviceToDevice);

// following code is just for testing and visualization...
cv::cuda::GpuMat gpuImg(left.rows, left.cols, left.type(), imgPtr, gpuLeft.step);
gpuImg.download(downloadedLeft);
imshow ("test", downloadedLeft);
waitKey(0);

[Written by someone who has never used OpenCV, not compiled or tested, use at own risk]

The only time your code would work correctly would be when the row pitch of the GpuMat was serendipitously the same as the number of columns times the size of the type stored in the matrix. This is likely to be images with sizes which are round powers of two.

Upvotes: 5

Related Questions