IADD.X GPU instruction

Question

When looking into the SASS output generated for the NVIDIA Fermi architecture, the instruction IADD.X is observed. From NVIDIA documentation, IADD means integer add, but not understanding what it means by IADD.X. Can somebody please help... Is this meaning an integer addition with extended number of bits?

The instruction snippet is:

IADD.X R5, R3, c[0x0][0x24];   /* 0x4800400090315c43 */

njuffa · Accepted Answer

Yes, the .X stands for eXtended precision. You will see IADD.X used together with IADD.CC, where the latter adds the less significant bits, and produces a carry flag (thus the .CC), and this carry flag is then incorporated into addition of the more significant bits performed by IADD.X.

Since NVIDIA GPUs are basically 32-bit processors with 64-bit addressing capability, a frequent use of this idiom is in address (pointer) arithmetic. The use of 64-bit integer types, such as long long int or uint64_t will likewise lead to the use of these instructions.

Here is a worked example of a kernel doing 64-bit integer addition. This CUDA code was compiled for compute capability 3.5 with CUDA 7.5, and the machine code dumped with cuobjdump --dump-sass.

__global__ void addint64 (long long int a, long long int b, long long int *res)
{
    *res = a + b;
}

MOV     R1, c[0x0][0x44];         
MOV     R2, c[0x0][0x148];        // b[31:0]
MOV     R0, c[0x0][0x14c];        // b[63:32]
IADD    R4.CC, R2, c[0x0][0x140]; // tmp[31:0] = b[31:0] + a[31:0]; carry-out
MOV     R2, c[0x0][0x150];        // res[31:0]
MOV     R3, c[0x0][0x154];        // res[63:32]
IADD.X  R5, R0, c[0x0][0x144];    // tmp[63:32] = b[63:32] + a[63:32] + carry-in
ST.E.64 [R2], R4;                 // [res] = tmp[63:0]
EXIT

IADD.X GPU instruction

Answers (1)

Related Questions