Karl
Karl

Reputation: 31

shared memory bank conflict with char array

I understand the bank conflict when dealing with 4-byte data types, but I wonder if we get any bank conflict (4-way/8-way?) with the following code

__shared__ char shared[];
foo = shared[threadIdx.x];

The above code leads to 4 consecutive threads in a warp accessing the same word address in the same bank.

Will a similar memory access pattern lead to a bank conflict for any CUDA device family? Apparently, it does so for old cards only but I want to confirm.

My question can be generalized further: What if a number of threads access the same bank addressable unit [8-byte or 4-byte] but each requires a fraction of it. will the hardware handle such requests without any bank conflict? Thank you

Upvotes: 3

Views: 430

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151869

All cc2.0 and newer GPU devices have a broadcast mechanism such that for any number of threads participating in a warp request which are accessing a given 32-bit aligned location or any portion of that location (or multiple groups of such threads, each group accessing a given 32-bit aligned location or any portion of that location), the threads in that group will be serviced in a single transaction, without serialization.

From the documentation:

A shared memory request for a warp does not generate a bank conflict between two threads that access any address within the same 32-bit word (even though the two addresses fall in the same bank): In that case, for read accesses, the word is broadcast to the requesting threads (multiple words can be broadcast in a single transaction) and for write accesses, each address is written by only one of the threads (which thread performs the write is undefined).

For the devices that support 8-byte-bank-mode, and which are in 8-byte-bank-mode, the above broadcast mechanism generalizes to a 64-bit aligned location.

Note that I've chosen my wording here carefully. Suppose I have two such broadcast groups in a single warp request. Now also suppose that those groups are addressing two different locations, but two locations in the same bank. E.g. group A is targetting address 0, and group B is targetting address 1024. In this case, all threads participating in group A will be serviced in a single transaction, and all threads participating in group B will be serviced in a single transaction, but those two groups will be serialized with respect to each other.

On the other hand, if group targetted address 0 and group B targetted address 8, those are in separate banks, so all of the threads in group A and all of the threads in group B could be serviced in a single transaction, because multiple broadcasts per request are permitted.

Upvotes: 2

Related Questions