Reputation: 131656
The CUDA documentation tells us that the result of a warp shuffle is undefined if the origin thread is "inactive". Does that mean we can safely shuffle with only part of the threads, and only need to pay attention to the junk data coming from the inactive ones? Or might the entire shuffle output be garbage?
Upvotes: 2
Views: 238
Reputation: 1781
If the target thread is inactive, the retrieved value is undefined.
My understanding is that the value returned to the thread that targeted an inactive thread is undefined. Threads that target an active thread behave as normal.
So you can get correct answers from shuffle in diverged code, so long as your target has followed the same path through the divergence.
Upvotes: 1