Bank conflicts in 2.x devices

Question

What is a bank conflict in devices with 2.x devices? As I understand the CUDA C programming guide, in 2.x devices, if two threads access the same 32 bit word in the same shared memory bank, it does not cause a bank conflict. Instead, the word is broadcasted. When the two threads write the same 32 bit word in the same shared memory bank, then only one thread succeeds.

Since on-chip memory is 64 KB (48 KB for shared memory and 16 KB for L1, or vice versa), and it is organized in 32 banks, I am assuming that each bank consists of 2 KB. So I think that bank conflicts will arise if two threads access two different 32 bit words in the same shared memory bank. Is this correct?

harrism · Accepted Answer

Your description is correct. There are many access patterns that can generate bank conflicts, but here's a simple and common example: strided access.

__shared__ int smem[512];

int tid = threadIdx.x;

x = smem[tid * 2]; // 2-way bank conflicts
y = smem[tid * 4]; // 4-way bank conflicts
z = smem[tid * 8]; // 8-way bank conflicts
// etc.

Bank ID = index % 32, so if you look at the pattern of addresses in the x, y, and z accesses, you can see that in each warp of 32 threads, for x, 2 threads will access each bank, for y, 4 threads will access each bank, and for z, 8 threads will access each bank.

Bank conflicts in 2.x devices

Answers (1)

Related Questions