Behzad Baghapour
Behzad Baghapour

Reputation: 169

Shared and Global memory accesses

I could find that for "global" memory access, the coalescing (neighboring) the memory addresses which required by threads is the key for optimum transaction while in "shared" memory the non-conflicting the addresses issued by threads is the key. Did I understand well?

Upvotes: 1

Views: 290

Answers (1)

pQB
pQB

Reputation: 3127

From NVIDIA CUDA Programming guide:

To maximize global memory throughput, it is therefore important to maximize coalescing by:

  • Following the most optimal access patterns based on Sections G.3.2 and G.4.2,
  • Using data types that meet the size and alignment requirement detailed in Section 5.3.2.1.1,
  • Padding data in some cases, for example, when accessing a two-dimensional array as described in Section 5.3.2.1.2.

This is related to the memory accesses of the threads in a warp which is coalesced 'packed' into one or more transactions. This issue has been relaxed for devices of compute capability 2.x.

On the other hand, for shared memory accesses you need to understand how this memory is implemented.

To achieve high bandwidth, shared memory is divided into equally-sized memory modules, called banks, which can be accessed simultaneously.

If two or more threads access the same bank the transfer is serialized, a.k.a. a bank conflict.

Appendix G. Compute Capabilities has more info about the architecture.

Regards!

Upvotes: 1

Related Questions