greenoldman
greenoldman

Reputation: 21052

Is coalescing triggered for accessing memory in reverse order?

Let's say I have several threads and they access memory at addresses A+0, A+4, A+8, A+12 (each access = next thread). Such access is coalesced, right?

However if I have access the same memory but in reverse order, meaning:

thread 0 -> A+12
thread 1 -> A+8
thread 2 -> A+4
thread 3 -> A+0

Is coalescing here also triggered?

Upvotes: 5

Views: 533

Answers (2)

All The Rage
All The Rage

Reputation: 831

It's also worth noting that a main purpose of the L2 cache in an Nvidia GPU is to collapse reads and coalesce writes. So if one warp was accessing

thread 0 -> A+0
thread 1 -> A+8
thread 2 -> A+16
thread 3 -> A+24
...

and another warp was accessing

thread 0 -> A+4
thread 1 -> A+12
thread 2 -> A+20
thread 3 -> A+28
...

these two accesses will not coalesce inside the SM but generally will coalesce in the L2 cache, so that GPU memory will only be touched once.

Upvotes: 2

Robert Crovella
Robert Crovella

Reputation: 151799

Yes, for cc 2.0 and newer GPUs, coalescing will occur for any random arrangement of 32 bit data elements to threads, as long as all the requested 32-bit data elements are coming from (requested from) the same 128 byte (and 128 byte aligned) region in global memory.

The GPU has something like a "crossbar switch" in the memory controller that will distribute elements as needed. You may be interested in this GPU webinar which discusses coalescing and will illustrate this particular case pictorially (on slide 12).

The NVIDIA webinar page has other useful webinars you may be interested in as well.

For pre-cc2.0 devices the specifics vary by compute capability, but compute 1.0 and 1.1 capable devices do not have this ability to coalesce reads that are in "reverse order" or random order.

Upvotes: 11

Related Questions