Whether Partial Identical Address Access Incurs Bank Conflicts in CUDA?

Question

I read some tutorials on CUDA programming. Most of them mention "If ALL threads of a half-warp access the identical address, there is no bank conflict (broadcast)". My question is that whether partial identical address access will incur bank conflicts in share memory in CUDA?

Assume each warp has 32 threads, then half-warp will be 16 threads.

(1) If all 16 threads access the same address A on Bank0, there will be no bank conflict since broadcast.

(2) But what if Thread-{0,1,...,6,7} want to access address A on Bank0 while Thread-{8,9,...,14,15} want to access address B on Bank1? I wonder whether there will be bank conflicts. Since NOT all half-warp threads access the identical address(only half-half-warp access the identical address), there will be bank conflicts.

Please correct me if my understanding is wrong. Thank you very much!

Robert Crovella · Accepted Answer

For compute capability 1.x (which devices are no longer supported in CUDA 7), a single broadcast word is allowed per non-bank-conflicted shared memory access cycle.

For compute capability 2.0 and beyond, any number of broadcast words are allowed in a single non-bank-conflicted shared memory access cycle, assuming all of those broadcast words are from separate banks.

Documentation:

and unlike for devices of compute capability 1.x, multiple words can be broadcast in a single transaction

Discussions of half-warps are only relevant to cc1.x devices. In your case 2, on cc1.x devices, there would be a serialization of the two accesses required, one for address A and one for address B. This is equivalent behaviorally to a 2-way bank conflict. In your case 2, for cc2.0 and beyond there would be no bank conflicts.

Whether Partial Identical Address Access Incurs Bank Conflicts in CUDA?

Answers (1)

Related Questions