Winotix
Winotix

Reputation: 11

cv::cuda::GpuMat::create allocates much more than requested

I'm using the latest OpenCV 4.x with CUDA supoprt + CUDA 11.6.

I want to allocate GpuMat image in device memory by doing so:

cv::cuda::GpuMat test1;
test1.create(100, 1000000, CV_8UC1);

and I measure consumed memory before create function call and after (using nvidia-smi tool).

Before:
|    0   N/A  N/A    372354      C   ...aur/example_build/example      199MiB |
After:
|    0   N/A  N/A    389636      C   ...aur/example_build/example      295MiB |

So + ~100 MB - makes sense.

But when I allocate the image this way (changed W and H):

cv::cuda::GpuMat test1;
test1.create(1000000, 100, CV_8UC1);

I see this:

Before:
|    0   N/A  N/A    379124      C   ...aur/example_build/example      199MiB |
After:
|    0   N/A  N/A    379124      C   ...aur/example_build/example      689MiB |

I expected the same increment as in test1 though. In various cases, consumption is x5 more than expected, when the image is "high and narrow". What do I understand wrong?

Upvotes: 1

Views: 941

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151944

In various cases, consumption is x5 more than expected, when the image is "high and narrow". What do I understand wrong?

OpenCV GpuMat uses a pitched allocation. If the minimum pitch is for example 512 bytes, then allocating a "narrow" image is going to be extra-expensive.

On my tesla V100, the minimum pitch (kind of like saying the minimum "width" for each line) for a pitched allocation is 512. 512/100 = 5x.

No I don't have any suggestions for workarounds. Allocate a wider image. Or accept the extra cost.

I think most CUDA GPUs will have a minimum pitch of 512 bytes, because the minimum texture alignment is 512 bytes. You can use the following code to find yours:

$ cat t2060.cu
#include <iostream>

int main(){

  char *d;
  size_t p;
  cudaMallocPitch(&d, &p, 1, 100);
  std::cout << p << std::endl;
}
$ nvcc -o t2060 t2060.cu
$ compute-sanitizer ./t2060
========= COMPUTE-SANITIZER
512
========= ERROR SUMMARY: 0 errors
$

(As an aside, I don't know how you decided that your first example shows +100MB. I see 199MiB and 201MiB. The difference between those two appears to be 2MB. But this doesn't seem to be the crux of your question, and the 500MB allocation for a 100MB image of width 100 bytes is explained above.)

Upvotes: 4

Related Questions