Reputation: 359
Given the following low level (SASS) instructions on the latest two generation of NVIDIA GPUs (ref http://docs.nvidia.com/cuda/cuda-binary-utilities/index.html), what are the (perhaps speculated) differences in the hardware / memory hierarchy design (and performance implications) ?
Surface Memory Instructions MAXWELL
SUATOM Surface Reduction
SULD Surface Load
SURED Atomic Reduction on surface memory
SUST Surface Store
Surface Memory Instructions KEPLER
SUCLAMP Surface Clamp
SUBFM Surface Bit Field Merge
SUEAU Surface Effective Address
SULDGA Surface Load Generic Address
SUSTGA Surface Store Generic Address
Upvotes: 0
Views: 481
Reputation: 4422
CUDA arrays wrap NVIDIA's proprietary array layouts that are optimized for 2D and 3D locality. The translation from coordinates to an address is intentionally obfuscated from developers, since it may change from one architecture to the next. It looks like NVIDIA chose to wrap this translation differently from Kepler to Maxwell, with Kepler implementing a more "RISC-like" approach. The SASS disassembly of the surf2dmemset
sample from the CUDA Handbook (https://github.com/ArchaeaSoftware/cudahandbook/blob/master/texturing/surf2Dmemset.cu) shows 6 instructions to write the output:
SUCLAMP PT, R8, R7, c[0x0][0x164], 0x0;
SUCLAMP.SD.R4 PT, R6, R6, c[0x0][0x15c], 0x0;
IMADSP.SD R9, R8, c[0x0][0x160], R6;
SUBFM P0, R8, R6, R8, R9;
SUEAU R9, R9, R8, c[0x0][0x154];
SUSTGA.B.32.TRAP.U8 [R8], c[0x0][0x158], R10, P0;
as compared to one for Maxwell:
SUST.D.BA.2D.TRAP [R2], R8, 0x55;
The "EA" in the Kepler instructions stands for "effective address," it's a more-complicated variant of the LEA (load effective address) instruction in CISC instruction sets.
As for SURED/SUATOM
, those must be the surface equivalents to GRED/GATOM
. Both perform atomic operations, but the ATOM
variants return the previous value of the memory location and the RED
variants do not. They don't need different intrinsics; the compiler emits the correct instruction automatically.
Upvotes: 3