Reputation: 303
Does host wait for device to finish its execution compeletely? e.g. the program has the structure as follows
// cpu code segment
// data transfer from host to device
QUESTION - WILL CPU WAIT FOR DEVICE TO FINISH TRANSFER? IF NO, IS IT POSSIBLE? IF YES, HOW?
// kernel launch
QUESTION - WILL CPU WAIT FOR DEVICE TO LET IT FINISH KERNEL EXECUTION (CONSIDERING KERNEL EXECUTION WILL TAKE NOTABLE TIME say-5 sec)? IF NO, IS IT POSSIBLE? IF YES, HOW?
// data transfer from device to host
// program terminates after printing some information
Upvotes: 18
Views: 21465
Reputation: 16816
The synchronization functions of the CUDA run-time can let you achieve what you want.
cudaDeviceSynchronize()
:
When you call this function, the CPU will wait until the device has completed ALL its work, whether it is memory copy or kernel execution.
cudaStreamSynchronize(cudaStream)
:
This function will block the CPU until the specified CUDA stream has finished its execution. Other CUDA streams will continue their execution asynchronously.
Upvotes: 32