Reputation: 761
I was focusing on the bank conflict problem illustrated in the Mark Harris's slides Optimizing Parallel Reduction in CUDA and I came to this question:
Slides 8 and 12 which demonstrate the divergent branch problem and non-divergent solution respectively, have bank conflict problem. But, this issue is only mentioned in slide 12.
As far as I know, every 4 bytes of data (integers in this case) is stored in one memory bank. In this case:
10 -> bank 1
1 -> bank 2
8 -> bank 3
.
.
.
and the problem of bank conflict occurs whenever threads of a warp request the bytes in the same bank. In slide 8, all 6 threads (of same warp) are requesting bytes from different banks (no bank conflict). Also, in slide 12 all 6 threads (again, of same warp) are requesting bytes from different banks (still no bank conflicts). Would someone clarify when exactly this problem arises?
Upvotes: 0
Views: 334
Reputation: 761
I guess I'm wrong.
In the slide 8, each thread with index 2i, is accessing the bank with index 2i. Thus thread 32 (last thread in the warp) accesses the bank with index 32. Thus, no bank conflicts occur (none of the threads in warp accessed the same bank).
In the slide 12, each thread with index i, is accessing the bank with index 2i. Thus thread 32 (last thread in the warp) accesses the bank with index 64. Analysis of access pattern shows that each thread accesses the same bank 2 times in this case which is a serious bank conflict.
Upvotes: 1