RAJ
RAJ

Reputation: 3

How to access Managed memory simultaneously by CPU and GPU in compute capability 5.0?

Since simultaneous access to managed memory on devices of compute capability lower than 6.x is not possible (CUDA Toolkit Documentation), is there a way to simulatneously access managed memory by CPU and GPU with compute capability 5.0 or any method that can make CPU access managed memory while GPU kernel is running.

Upvotes: 0

Views: 1045

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151849

is there a way to simulatneously access managed memory by CPU and GPU with compute capability 5.0

No.

or any method that can make CPU access managed memory while GPU kernel is running.

Not on a compute capability 5.0 device.

You can have "simultaneous" CPU and GPU access to data using CUDA zero-copy techniques.

A full tutorial on both Unified memory as well as Pinned/Mapped/Zero-copy memory is well beyond the scope of what I can write in an answer here. Unified Memory has its own section in the programming guide. Both of these topics are extensively covered here on the cuda tag on SO as well as many other places on the web. Any questions will likely be answerable with a google search.

In a nutshell, zero-copy memory on 64-bit OS is accessed via a host pinning API such as cudaHostAlloc(). The memory so allocated is host memory, and always remains there, but it is accessible to the GPU. The access to this memory from GPU to host memory occurs across the PCIE bus, so it is much slower than normal global memory access. The pointer returned by the allocation (on 64-bit OS) is usable in both host and device code. You can study CUDA sample codes that use zero-copy techniques such as simpleZeroCopy.

By contrast, ordinary unified memory (UM) is data that will be migrated to the processor that is using it. In a pre-pascal UM regime, this migration is triggered by kernel calls and synchronizing operations. Simultaneous access by host and device in this regime is not possible. For pascal and beyond devices in a proper UM post-pascal regime (basically, 64-bit linux only, CUDA 8+), the data is migrated on-demand, even during kernel execution, thus allowing a limited form of "simultaneous" access. Unified Memory has various behavior modes, and some of those will cause a unified memory allocation to "decay" into a pinned/zero-copy host allocation, under some circumstances.

Upvotes: 1

Related Questions