Reputation: 9912
I have spent all day yesterday reading how to use managed (unified) memory array for a CUDA program (using the book Professional CUDA Programming, practiced some of the sample code (although I still got doubts about the profiler info) and I am ready to apply it to my program that uses both a CUDA kernel and some OpenCV functions.
I have several questions, but let me address here the first one.
I have
cv::Mat h_image;
h_image = cv::imread(dirname+image_filenames[ni], cv::IMREAD_GRAYSCALE);
cv::cuda::GpuMat d_image;
// 2. Upload the Image
d_image.upload(h_image);
So I have an image read with imread
and I upload it to the device memory.
How can I use unified memory for this?
In theory, to use unified memory I can have (with float arrays)
float *A;
cudaMallocManaged((void **)&A, nBytes);
or even (and I prefer this)
__device__ __managed__ float A[67108864];
Is there a way to do something similar with Mats and GpuMats?
Upvotes: 0
Views: 1114
Reputation: 3390
This is another option which well make all CV mat use shared memory
cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator(cv::cuda::HostMem::AllocType::SHARED));
Upvotes: 1
Reputation: 9912
I found an answer to this (have not tried it yet) here and actually a very good answer since it explains both using unified memory and pinned memory
It basically uses this
#ifndef USE_UNIFIED_MEM
/* Pinned memory. No cache */
std::cout << "Using pinned memory" << std::endl;
void *device_ptr, *host_ptr;
cudaSetDeviceFlags(cudaDeviceMapHost);
cudaHostAlloc((void **)&host_ptr, frameByteSize, cudaHostAllocMapped);
cudaHostGetDevicePointer((void **)&device_ptr, (void *) host_ptr , 0);
cv::Mat frame_out(height, width, CV_8UC3, host_ptr);
cv::cuda::GpuMat d_frame_out(height, width, CV_8UC3, device_ptr);
#else
/* Unified memory */
std::cout << "Using unified memory" << std::endl;
void *unified_ptr;
cudaMallocManaged(&unified_ptr, frameByteSize);
cv::Mat frame_out(height, width, CV_8UC3, unified_ptr);
cv::cuda::GpuMat d_frame_out(height, width, CV_8UC3, unified_ptr);
#endif
I am going to try this but without allocating it dynamically but with __device__ __managed__
too
Upvotes: 1