Grigorii Alekseev
Grigorii Alekseev

Reputation: 177

Synchronization between CUDA applications

Is there a way to synchronize two different CUDA applications on the same GPU?

I have two different part of processes: original process & post processing. Original process is using GPU. And now we're going to migrate post processing to GPU also. In our architecture there is a requirement, that this two processes should be organized as two separate applications. And now I'm thinking of synchronization problem:

Is there some flag for that purpose? Or some workaround?

Upvotes: 1

Views: 1373

Answers (1)

talonmies
talonmies

Reputation: 72372

Is there a way to synchronize two different CUDA applications on the same GPU?

In a word, no. You would have to do this via some sort of inter-process communication mechanism on the host side.

If you are on Linux or on Windows with a GPU in TCC mode, host IPC will still be required, but you can "interlock" CUDA activity in one process to CUDA activity in another process using the CUDA IPC mechanism. In particular, it is possible to communicate an event handle to another process, using cudaIpcGetEventHandle, and cudaIpcOpenEventHandle. This would provide an event that you could use for a cudaStreamWaitEvent call. Of course, this is really only half of the solution. You would also need to have CUDA IPC memory handles. The CUDA simpleIPC sample code has most of the plumbing you need.

You should also keep in mind that CUDA cannot be used in a child process if CUDA has been initialized in a parent process. This concept is also already provided for in the sample code.

So you would do something like this:

Process A:

  • create (cudaMalloc) allocation for buffer to hold results to send to post-process
  • create event for synchronization
  • get cuda IPC memory and event handles
  • using host-based IPC, communicate these handles to process B
  • launch processing work (i.e. GPU kernel) on the data, results should be put in the buffer created above
  • into the same stream as the GPU kernel, record the event
  • signal to process B via host based IPC, to launch work

Process B:

  • receive memory and event handles from process A using host IPC
  • extract memory pointer and create IPC event from the handles
  • create a stream for work issue
  • wait for signal from process A (indicates event has been recorded)
  • perform cudaStreamWaitEvent using the local event and created stream
  • in that same stream, launch the post-processing kernel

This should allow the post-processing kernel to begin only when kernel from process A is complete, using the event interlock. Another caveat with this is that you cannot allow process A to terminate at any time during this. Since it is the owner of the memory and event, it must continue to run as long as that memory or that event is required, even if required in another process. If that is a concern, it might make sense to make process B the "owner" and communicate the handles to process A.

Upvotes: 4

Related Questions