Reputation: 1445
I have a code which goes something like this.
1) Host: Launch Graphics Kernels 2) Host: Launch CUDA Kernels (all async calls) 3) Host: Do a bunch of number crunching on the host 4) Back to step 1
My questions is this. The CUDA API guarantees that the CUDA kernels even if they are async are executed in order of being launched. Does this apply to the rendering ? Lets say I have some rendering related calculations in progress on the GPU. If I launch async CUDA calls, Will they only be executed once the rendering is complete ? Or will these two operations overlap ?
Also, if i call a CUDA device synchronize after step 2, it certainly forces the device to complete CUDA related functions calls. What about rendering ? Does it stall the host until the rendering related operations are complete as well ?
Upvotes: 1
Views: 556
Reputation: 1507
Calling CUDA kernels somehow locks GPU, therefore any other usage of GPU is not supported. Each process of host code has to execute device code in a specific context and the only one context can be active on a single device at a time.
Callig cudaDeviceSynchronize();
blocks the calling host code. After completing the execution of all streams of device code, control is returned to the calling host code.
EDIT:
See this very comprehensive but somewhat out-of-date answer and you can study this paper to see what are capable of last devices. In short, launching CUDA kernel, or even calling cudaSetDevice()
on a device that is being concurrently utilized by another thread crashes by throwing some error. If you would like to utilize your GPU by concurrent CUDA processes, there is a possibility (on linux-only machines) to use some kind of inter-layer (called MPS) between host threads and CUDA API calls. This is described in my second link.
Upvotes: 1