Reputation: 351
What is the Hip-Clang equivalent of this CUDA function?
__device__ __forceinline__ uint32_t add_cc(uint32_t a, uint32_t b)
{
uint32_t r;
asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b));
return r;
}
I'm porting a CUDA project to HIP-Clang that contains inline PTX assembly. The function is used to implement multi-precision addition in the NVIDIA GPU. I tried:
asm volatile ("add.cc.u32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=r"(r) : "r"(a), "r"(b)); //invalid operand for instruction
asm volatile ("V_ADD_CO_U32 %0, %1, %2;" : "=v"(r) : "v"(a), "v"(b)); //operands are not valid for this GPU or mode
The target hardware is RX 6800. AMD clang version 14.0.0.
Is RDNA2 the correct instruction set reference?
Is this LLVM user guide to AMDGPU backend an applicable reference?
Upvotes: 3
Views: 1150
Reputation: 351
It turns out the answer is hardware dependent. For my hardware for which the compiler defines __gfx1030__
the correct syntax is
asm volatile ("v_add_co_u32 %0, vcc_lo, %1, %2;" : "=v"(r) : "v"(a), "v"(b));
For earlier architechtures such as __gfx900__
replace vcc_lo
with vcc
See the discussion on the Rocm Hip Github and this AMD gpu assembly reference.
Upvotes: 4