Bank conflicts of shared memory in reduction pattern?

Question

I was focusing on the bank conflict problem illustrated in the Mark Harris's slides Optimizing Parallel Reduction in CUDA and I came to this question:

Slides 8 and 12 which demonstrate the divergent branch problem and non-divergent solution respectively, have bank conflict problem. But, this issue is only mentioned in slide 12.

As far as I know, every 4 bytes of data (integers in this case) is stored in one memory bank. In this case:

10 -> bank 1
1  -> bank 2
8  -> bank 3
   .
   .
   .

and the problem of bank conflict occurs whenever threads of a warp request the bytes in the same bank. In slide 8, all 6 threads (of same warp) are requesting bytes from different banks (no bank conflict). Also, in slide 12 all 6 threads (again, of same warp) are requesting bytes from different banks (still no bank conflicts). Would someone clarify when exactly this problem arises?

Bank conflicts of shared memory in reduction pattern?

Answers (1)

Related Questions