Reputation: 253
I am reading CUDA_C_Programming_Guide, and in shared memory topics, I have cam across an example: Device Compute capability: 1.0, 16 banks in shared memory
extern __shared__ float shared[];
float data = shared[BaseIndex + s * tid];
And in the explanation they have concluded 's' has to be odd, can anyone please help me understand what happens when s
is even and what happens when s
is odd?
Upvotes: 0
Views: 254
Reputation: 1507
Conclusion of odd s
is not easy to directly see, but if you try to derivate when bank conflict occurs (two threads tid and tid' access the same bank), assuming 32 is number of banks:
s*tid == s*tid' (mod 32)
s*tid == s*(tid + n) (mod 32) where tid' = tid + n
s*tid == s*tid + s*n (mod 32)
s*n == 0 (mod 32)
n = (32/d)*k for some k and d = gcd(s, 32)
so bank conflict will not occur when 32 is less than or equal to 32/d
and since d = gcd(s, 2^5), s has to be odd.
About your question in comments, I didn't fully get what you don't understand, but simple explanation: if two threads try to access the same bank(it means accessing two words in the same row) accesses are serialized.
Upvotes: 1