What does thread-count mean for bar.arrive PTX barrier synchronization instruction?

Question

It is mentioned here in the PTX documentation that bar.sync and bar.arrive barrier synchronization instructions can be used as below:

bar.sync      a{, b};
bar.arrive    a, b;

Where

Source operand a specifies a logical barrier resource as an immediate constant or register with value 0 through 15. Operand b specifies the number of threads participating in the barrier.

It also shows an example where a producer-consumer model is established using these instructions:

// Producer code places produced value in shared memory.
st.shared [r0],r1; 
bar.arrive 0,64;
...

// Consumer code, reads value from shared memory 
bar.sync 0,64; 
ld.shared r1,[r0];
...

I do not quite get the purpose of operand b in bar.arrive. While such operand in bar.sync can be used to control the number of threads involved in the barrier and wait until the thread count is reached, its use for bar.arrive is not clear to me.

tera · Accepted Answer

Two things happen when all threads have arrived at a barrier:

All waiting threads are allowed to proceed beyond the barrier.
The barrier is re-initialized so it is ready to be used again.

You are probably thinking only of 1., which can only happen at a bar.sync instruction. Therefore it it obvious a bar.sync needs to know the number of threads participating in the barrier. However, the barrier can also be released at a bar.arrive which therefore also needs to know the number of participating threads.

Having said that, it is undocumented what happens if participating warps or even threads disagree on the number of threads involved. This could be seen as an opportunity to a daring inquisitive mind to find possible new (and unsupported!) synchronization constructs through reverse-engineering.

What does thread-count mean for bar.arrive PTX barrier synchronization instruction?

Answers (1)

Related Questions