Reputation: 3438
It is mentioned here in the PTX documentation that bar.sync
and bar.arrive
barrier synchronization instructions can be used as below:
bar.sync a{, b};
bar.arrive a, b;
Where
Source operand a specifies a logical barrier resource as an immediate constant or register with value 0 through 15. Operand b specifies the number of threads participating in the barrier.
It also shows an example where a producer-consumer model is established using these instructions:
// Producer code places produced value in shared memory.
st.shared [r0],r1;
bar.arrive 0,64;
...
// Consumer code, reads value from shared memory
bar.sync 0,64;
ld.shared r1,[r0];
...
I do not quite get the purpose of operand b
in bar.arrive
. While such operand in bar.sync
can be used to control the number of threads involved in the barrier and wait until the thread count is reached, its use for bar.arrive
is not clear to me.
Upvotes: 2
Views: 626
Reputation: 7265
Two things happen when all threads have arrived at a barrier:
You are probably thinking only of 1., which can only happen at a bar.sync
instruction. Therefore it it obvious a bar.sync
needs to know the number of threads participating in the barrier. However, the barrier can also be released at a bar.arrive
which therefore also needs to know the number of participating threads.
Having said that, it is undocumented what happens if participating warps or even threads disagree on the number of threads involved. This could be seen as an opportunity to a daring inquisitive mind to find possible new (and unsupported!) synchronization constructs through reverse-engineering.
Upvotes: 3