Reputation: 169
I have to use shared memory that is 64 elements in size, twice the number of banks and threads in a warp. How should I address them to yield a bank-conflict-free access?
Upvotes: 0
Views: 871
Reputation: 1223
Let's assume you're using compute capability 1.x, so your shared memory has 16 banks, and each thread has to access 2 elements in shared memory.
What you want is for a thread to access the same memory bank for both elements, so if you organize it such that the required elements are 16 away from each other, you should avoid bank conflicts.
__shared__ int shared[32];
int data = shared[base + stride * tid];
int data = shared[base + stride * tid + 16];
I used this pattern for storing complex floats, but I had an array of complex floats, so it looked like
#define TILE_WIDTH 16
__shared__ float shared[TILE_WIDTH][2*TILE_WIDTH + 1];
float real = shared[base + stride * tid];
float imag = shared[base + stride * tid + TILE_WIDTH];
Where the +1 is to avoid serialization in transposed access patterns.
Upvotes: 0
Reputation: 1839
In case of 32-bit memory access you can use default memory access pattern.
__shared__ int shared[32];
int data = shared[base + stride * tid];
there stride
is odd.
If you have 64-bit access you can use some trick like this:
struct type
{
int x, y, z;
};
__shared__ struct type shared[32];
struct type data = shared[base + tid];
Upvotes: 2