KansaiRobot
KansaiRobot

Reputation: 9912

How can you use managed (unified) memory for an image?

I have spent all day yesterday reading how to use managed (unified) memory array for a CUDA program (using the book Professional CUDA Programming, practiced some of the sample code (although I still got doubts about the profiler info) and I am ready to apply it to my program that uses both a CUDA kernel and some OpenCV functions.

I have several questions, but let me address here the first one.

I have

cv::Mat h_image;
h_image = cv::imread(dirname+image_filenames[ni], cv::IMREAD_GRAYSCALE);

cv::cuda::GpuMat d_image;
// 2. Upload the Image
d_image.upload(h_image);

So I have an image read with imread and I upload it to the device memory. How can I use unified memory for this?

In theory, to use unified memory I can have (with float arrays)

float *A;
cudaMallocManaged((void **)&A, nBytes);

or even (and I prefer this)

__device__ __managed__ float A[67108864];

Is there a way to do something similar with Mats and GpuMats?

Upvotes: 0

Views: 1114

Answers (2)

Jakobovski
Jakobovski

Reputation: 3390

This is another option which well make all CV mat use shared memory

cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator(cv::cuda::HostMem::AllocType::SHARED));

Upvotes: 1

KansaiRobot
KansaiRobot

Reputation: 9912

I found an answer to this (have not tried it yet) here and actually a very good answer since it explains both using unified memory and pinned memory

It basically uses this

#ifndef USE_UNIFIED_MEM
    /* Pinned memory. No cache */
    std::cout << "Using pinned memory" << std::endl;
    void *device_ptr, *host_ptr;
    cudaSetDeviceFlags(cudaDeviceMapHost);
    cudaHostAlloc((void **)&host_ptr, frameByteSize, cudaHostAllocMapped);
    cudaHostGetDevicePointer((void **)&device_ptr, (void *) host_ptr , 0);
    cv::Mat frame_out(height, width, CV_8UC3, host_ptr);
    cv::cuda::GpuMat d_frame_out(height, width, CV_8UC3, device_ptr);
#else
    /* Unified memory */
    std::cout << "Using unified memory" << std::endl;
    void *unified_ptr;
    cudaMallocManaged(&unified_ptr, frameByteSize);
    cv::Mat frame_out(height, width, CV_8UC3, unified_ptr);
    cv::cuda::GpuMat d_frame_out(height, width, CV_8UC3, unified_ptr);
#endif

I am going to try this but without allocating it dynamically but with __device__ __managed__ too

Upvotes: 1

Related Questions