Do warp vote functions synchronize threads in the warp?

Question

Do CUDA warp vote functions, such as __any() and __all(), synchronize threads in the warp?

In other words, is there any guarantee that all threads inside the warp execute instructions preceding warp vote function, especially the instruction(s) that manipulate the predicate?

ArchaeaSoftware · Accepted Answer

The synchronization is implicit, since threads within a warp execute in lockstep. [*]

Code that relies on this behavior is known as "warp synchronous."

[*] If you are thinking that conditional code will cause threads within a warp to follow different execution paths, you have more to learn about how CUDA hardware works. Divergent conditional code (i.e. conditional code where the condition is true for some threads but not for others) causes certain threads within the warp to be disabled (either by predication or the branch synchronization stack), but each thread still occupies one of the 32 lanes available in the warp.

Do warp vote functions synchronize threads in the warp?

Answers (2)

Related Questions