Reputation: 11
I'm using the latest OpenCV 4.x with CUDA supoprt + CUDA 11.6.
I want to allocate GpuMat
image in device memory by doing so:
cv::cuda::GpuMat test1;
test1.create(100, 1000000, CV_8UC1);
and I measure consumed memory before create
function call and after (using nvidia-smi
tool).
Before:
| 0 N/A N/A 372354 C ...aur/example_build/example 199MiB |
After:
| 0 N/A N/A 389636 C ...aur/example_build/example 295MiB |
So + ~100 MB - makes sense.
But when I allocate the image this way (changed W and H):
cv::cuda::GpuMat test1;
test1.create(1000000, 100, CV_8UC1);
I see this:
Before:
| 0 N/A N/A 379124 C ...aur/example_build/example 199MiB |
After:
| 0 N/A N/A 379124 C ...aur/example_build/example 689MiB |
I expected the same increment as in test1
though.
In various cases, consumption is x5 more than expected, when the image is "high and narrow". What do I understand wrong?
Upvotes: 1
Views: 941
Reputation: 151944
In various cases, consumption is x5 more than expected, when the image is "high and narrow". What do I understand wrong?
OpenCV GpuMat
uses a pitched allocation. If the minimum pitch is for example 512 bytes, then allocating a "narrow" image is going to be extra-expensive.
On my tesla V100, the minimum pitch (kind of like saying the minimum "width" for each line) for a pitched allocation is 512. 512/100 = 5x.
No I don't have any suggestions for workarounds. Allocate a wider image. Or accept the extra cost.
I think most CUDA GPUs will have a minimum pitch of 512 bytes, because the minimum texture alignment is 512 bytes. You can use the following code to find yours:
$ cat t2060.cu
#include <iostream>
int main(){
char *d;
size_t p;
cudaMallocPitch(&d, &p, 1, 100);
std::cout << p << std::endl;
}
$ nvcc -o t2060 t2060.cu
$ compute-sanitizer ./t2060
========= COMPUTE-SANITIZER
512
========= ERROR SUMMARY: 0 errors
$
(As an aside, I don't know how you decided that your first example shows +100MB. I see 199MiB and 201MiB. The difference between those two appears to be 2MB. But this doesn't seem to be the crux of your question, and the 500MB allocation for a 100MB image of width 100 bytes is explained above.)
Upvotes: 4