einpoklum
einpoklum

Reputation: 131546

How do I mitigate CUDA's very long initialization delay?

Initializing CUDA in a newly-created process can take quite some time as long as a half-second or more on many server-grade machines of today. As @RobertCrovella explains, CUDA initialization usually includes establishment of a Unified Memory model, which involves harmonizing of device and host memory maps. This can take quite a long time for machines with a lot of memory; and there might be other factors contributing to this long delay.

This effect becomes quite annoying when you want to run a sequence of CUDA-utilizing processes, which do not use complicated virtual memory mappings: They each have to wait their their long wait - despite the fact that "essentially", they could just re-use whether initializations CUDA made the last time (perhaps with a bit of cleanup code).

Now, obviously, if you somehow rewrote the code for all those processes to execute within a single process - that would save you those long initialization costs. But isn't there a simpler approach? What about:

Upvotes: 2

Views: 1550

Answers (1)

talonmies
talonmies

Reputation: 72348

What you are asking about already exists. It is called MPS (MULTI-PROCESS SERVICE), and it basically keeps a single GPU context alive at all times with a daemon process that emulates the driver API. The initial target application is MPI, but it does bascially what you envisage.

Read more here:

https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

http://on-demand.gputechconf.com/gtc/2015/presentation/S5584-Priyanka-Sah.pdf

Upvotes: 1

Related Questions