Reputation: 5186
Is there a way how to run a host code while CUDA device function is running? Due the fact CUDA runtime has to wait until the device function has finished I was wondering if there is a possibility that in the meantime a provided host function delegate would be called.
Something like this:
Starting a thread before the <<<function>>>
call is for me not the same [Overhead, ...].
Upvotes: 4
Views: 868
Reputation: 151869
CUDA kernel calls are asynchronous. This means that control is returned to the host thread that made the kernel call, before the kernel actually starts executing.
So you can run host code concurrently with a kernel simply by placing that host code immediately after the kernel call (and before any other CUDA API calls such as cudaDeviceSynchronize()
or cudaMemcpy()
). The host code placed there will run concurrently with the kernel, for as long as the kernel executes (and as long as the host code executes.) If you get to a point in your host code where you need results from the device (kernel), then a non-async CUDA API call such as cudaDeviceSynchronize()
or cudaMemcpy()
will force the host code (thread) to wait until the previously issued CUDA activity (kernels) are complete.
You may wish to read about asynchronous concurrent execution in the programming guide.
Upvotes: 4