CUDA: Is it possibile to synchronize threads based on threadIdx?

Question

As the title says, I wonder whether it is possible to launch a sort of __syncthreads() function, where the barrier is not at block level but at sub-block level, so that I can sync all threads having a particular threadIdx.x?

For instance, if I define a kernel launch as <<<1, (32, 32)>>>, is it possible to define something like __syncthreads(5) so that it syncs all threads having threadIdx.x == 5?

Following the documentation, it seems that CUDA does not define such a function; however, I wonder whether there exists some trick that can achieve the same result.

CUDA: Is it possibile to synchronize threads based on threadIdx?

Answers (1)

Related Questions