Paul Caheny
Paul Caheny

Reputation: 1291

Is there implicit barrier synchronisation at the start or end of a CUDA __device__ function?

If a CUDA Kernel calls a __device__ function is there any implicit synchronisation between all the threads in the block either entering or exiting the __device__ function?

If not then it means some threads in the block could have exited the __device__ function before other threads in the block have even entered it (in the absence of any explicit synchronisation)?

Any pointers to relevant information/references would be appreciated.

Upvotes: 3

Views: 465

Answers (2)

user1545642
user1545642

Reputation:

yes there is only implicit synchronization btw threads in a warp as talonmies pointed out. When the kernel is launched, hardware peeks up any warp (probably the first one) and executes the first instruction for it, then it switches to another warp. It is potentially possible that some warp gets retired before another warp has even executed the first instruction for the kernel since warps are not required to wait for one another on exit

Upvotes: 1

Bardia
Bardia

Reputation: 393

you should use __syncthreads() function in your kernel code. After calling the __device__ function add the __syncthreads(); line to put a barrier and synchronize the threads.

Upvotes: 0

Related Questions