BAdhi
BAdhi

Reputation: 510

Thread synchronization with syncwarp

Apart from the __syncthreads() function(s) which synchronizes the warps within a thread block, theres another function called __syncwarp(). What exactly does this function do?

The cuda programming guide says,

will cause the executing thread to wait until all warp lanes named in mask have executed a __syncwarp() (with the same mask) before resuming execution. All non-exited threads named in mask must execute a corresponding __syncwarp() with the same mask, or the result is undefined.

Executing __syncwarp() guarantees memory ordering among threads participating in the barrier. Thus, threads within a warp that wish to communicate via memory can store to memory, execute __syncwarp(), and then safely read values stored by other threads in the warp.

So does this mean that this function ensures synchronization in threads within a warp that is included by the mask? If so, do we need such synchronization within the threads in the same warp since they all are ensured to be executed in lockstep?

Upvotes: 10

Views: 4913

Answers (1)

Mo Sani
Mo Sani

Reputation: 408

This feature is available on CUDA 9 and yes it synchronizes all threads within a warp and useful for divergent warps. This is useful for Volta architecture in which threads within a warp can be scheduled separately.

Upvotes: 9

Related Questions