Reputation: 193
I am running an iterative program in cuda, which runs till convergence. As said in this SO post (Are cuda kernel calls synchronous or asynchronous), from point of view of CPU, cuda kernels are asynchronous.
In my program, one of the kernel checks for convergence and returns the boolean value to the host to read. I wanted to know, whether I need to do
cudaDeviceSynchronize()
before reading the boolean value?
Upvotes: 1
Views: 700
Reputation: 14565
It depends how are you returning the Boolean value back to the CPU. are you using cudaMemcpy? if yes then you don't have to use cudaDeviceSynchronize(), since cudaMemcpy will block until the kernel finishes execution and then copies data from GPU to CPU.
Upvotes: 5