gpunerd
gpunerd

Reputation: 151

Syntax on inline PTX code for CUDA

As written in Nvidia's Inline PTX Assembly document, the grammar for using inline assembly is: asm("temp_string" : "constraint"(output) : "constraint"(input));
Here are two examples:
asm("vadd.s32.s32.s32 %0, %1.h0, %2.h0;" : "=r"(v) : "r"(a), "r"(b));
asm("vadd.u32.u32.u32 %0.b0, %1, %2, %3;" : "=r"(v) : "r"(a), "r"(b), "r"(z));
In both examples, there are parameters such as:h0 or b0 follow the %n. I looked through CUDA's official document and didn't find anything concerns about the meaning of h0 or b0. I've seen h0,h1 and b0,b1,b2,b3. I guess h0 or h1 represents a 16bit value, while bn represents a byte value. Does someone know the exact meaning of these?

Thanks for the help from Roger Dahl. I read the PTX ISA 3.0 and found the answer.
"h" means half-word. h0 means the low half-word of a 32bit word. h1 means the high half-word of a 32bit word. "b" means an integer byte. b0,b1,b2 and b3 represent the first 8bit, second 8bit, third 8bit and highest 8bit of a 32bit word.

Upvotes: 1

Views: 1535

Answers (1)

Roger Dahl
Roger Dahl

Reputation: 15734

vadd is one of the video specific instructions that are included with PTX. A description of the complete PTX ISA is included with the CUDA distribution. On my machine, it's in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\doc\ptx_isa_3.0.pdf. The description of the h0, h1, b0, etc, designators are in the 8.7.11 Video Instructions section. They represent different implicit shift/mask operations (see the optMerge function).

Upvotes: 2

Related Questions