Run Host code in the same Thread while CUDA Device code is executed

Question

Is there a way how to run a host code while CUDA device function is running? Due the fact CUDA runtime has to wait until the device function has finished I was wondering if there is a possibility that in the meantime a provided host function delegate would be called.

Something like this:

enter image description here

Starting a thread before the <<>> call is for me not the same [Overhead, ...].

Robert Crovella · Accepted Answer

CUDA kernel calls are asynchronous. This means that control is returned to the host thread that made the kernel call, before the kernel actually starts executing.

So you can run host code concurrently with a kernel simply by placing that host code immediately after the kernel call (and before any other CUDA API calls such as cudaDeviceSynchronize() or cudaMemcpy()). The host code placed there will run concurrently with the kernel, for as long as the kernel executes (and as long as the host code executes.) If you get to a point in your host code where you need results from the device (kernel), then a non-async CUDA API call such as cudaDeviceSynchronize() or cudaMemcpy() will force the host code (thread) to wait until the previously issued CUDA activity (kernels) are complete.

You may wish to read about asynchronous concurrent execution in the programming guide.

Run Host code in the same Thread while CUDA Device code is executed

Answers (1)

Related Questions