Bank-Conflict-Free Access in shared memory

I have to use shared memory that is 64 elements in size, twice the number of banks and threads in a warp. How should I address them to yield a bank-conflict-free access?

Upvotes: 0

Answers (2)

P O'Conbhui

Reputation: 1223

Let's assume you're using compute capability 1.x, so your shared memory has 16 banks, and each thread has to access 2 elements in shared memory.

What you want is for a thread to access the same memory bank for both elements, so if you organize it such that the required elements are 16 away from each other, you should avoid bank conflicts.

__shared__ int shared[32];
int data = shared[base + stride * tid];
int data = shared[base + stride * tid + 16];

I used this pattern for storing complex floats, but I had an array of complex floats, so it looked like

#define TILE_WIDTH 16

__shared__ float shared[TILE_WIDTH][2*TILE_WIDTH + 1];
float real = shared[base + stride * tid];
float imag = shared[base + stride * tid + TILE_WIDTH];

Where the +1 is to avoid serialization in transposed access patterns.

Upvotes: 0

geek

Reputation: 1839

In case of 32-bit memory access you can use default memory access pattern.

__shared__ int shared[32];
int data = shared[base + stride * tid];

there stride is odd.

If you have 64-bit access you can use some trick like this:

struct type 
{  
   int x, y, z;
};
__shared__ struct type shared[32];
struct type data = shared[base + tid];

Upvotes: 2

Bank-Conflict-Free Access in shared memory

Answers (2)

Related Questions