Synchronization between CUDA applications

Question

Is there a way to synchronize two different CUDA applications on the same GPU?

I have two different part of processes: original process & post processing. Original process is using GPU. And now we're going to migrate post processing to GPU also. In our architecture there is a requirement, that this two processes should be organized as two separate applications. And now I'm thinking of synchronization problem:

if I synchronize them on CPU level, I have to know outside when GPU of 1 app is over.
ideal way as I see is to synchronize them somehow on GPU level.

Is there some flag for that purpose? Or some workaround?

talonmies · Accepted Answer

Is there a way to synchronize two different CUDA applications on the same GPU?

In a word, no. You would have to do this via some sort of inter-process communication mechanism on the host side.

If you are on Linux or on Windows with a GPU in TCC mode, host IPC will still be required, but you can "interlock" CUDA activity in one process to CUDA activity in another process using the CUDA IPC mechanism. In particular, it is possible to communicate an event handle to another process, using cudaIpcGetEventHandle, and cudaIpcOpenEventHandle. This would provide an event that you could use for a cudaStreamWaitEvent call. Of course, this is really only half of the solution. You would also need to have CUDA IPC memory handles. The CUDA simpleIPC sample code has most of the plumbing you need.

You should also keep in mind that CUDA cannot be used in a child process if CUDA has been initialized in a parent process. This concept is also already provided for in the sample code.

So you would do something like this:

Process A:

create (cudaMalloc) allocation for buffer to hold results to send to post-process
create event for synchronization
get cuda IPC memory and event handles
using host-based IPC, communicate these handles to process B
launch processing work (i.e. GPU kernel) on the data, results should be put in the buffer created above
into the same stream as the GPU kernel, record the event
signal to process B via host based IPC, to launch work

Process B:

receive memory and event handles from process A using host IPC
extract memory pointer and create IPC event from the handles
create a stream for work issue
wait for signal from process A (indicates event has been recorded)
perform cudaStreamWaitEvent using the local event and created stream
in that same stream, launch the post-processing kernel

This should allow the post-processing kernel to begin only when kernel from process A is complete, using the event interlock. Another caveat with this is that you cannot allow process A to terminate at any time during this. Since it is the owner of the memory and event, it must continue to run as long as that memory or that event is required, even if required in another process. If that is a concern, it might make sense to make process B the "owner" and communicate the handles to process A.

Synchronization between CUDA applications

Answers (1)

Related Questions