Reputation: 1291
If a CUDA Kernel calls a __device__ function is there any implicit synchronisation between all the threads in the block either entering or exiting the __device__ function?
If not then it means some threads in the block could have exited the __device__ function before other threads in the block have even entered it (in the absence of any explicit synchronisation)?
Any pointers to relevant information/references would be appreciated.
Upvotes: 3
Views: 465
Reputation:
yes there is only implicit synchronization btw threads in a warp as talonmies pointed out. When the kernel is launched, hardware peeks up any warp (probably the first one) and executes the first instruction for it, then it switches to another warp. It is potentially possible that some warp gets retired before another warp has even executed the first instruction for the kernel since warps are not required to wait for one another on exit
Upvotes: 1
Reputation: 393
you should use __syncthreads()
function in your kernel code. After calling the __device__
function add the __syncthreads();
line to put a barrier and synchronize the threads.
Upvotes: 0