Definitions for load and store operations on CUDA memory types (e.g. shared, global) in llvm

Question

In the LLVM source code file llvm/lib/Target/NVPTX/NVPTXIntrinsics.td, the definitions for atom_add, atom_sub, atom_max, atom_min, atom_inc, atom_dec etc on CUDA memory types can be seen. But I was unable to find the load and store operations on CUDA memories anywhere in these files. So where the load and store operations are defined for CUDA memory types in llvm?

Michael Haidl · Accepted Answer

You won't find them as intrinsics because there are no intrinsics for load and stores to the CUDA memory hierarchy. NVPTX uses the address spaces on the pointer operand of a load or store instruction to determine which PTX instruction should be generated.

A load to a pointer to address space 1 will translate to ld.global. while a load performed on an pointer to address space 3 (shared memory) will result in a ld.shared. instruction. Loads to a a generic pointer, i.e., a pointer in address space 0 will result in a ld. instruction.

This translation happens during Instruction Selection in the NVPTX backend. Have a look to ./llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cppto find out how instruction selection in NVPTX happens. For example in SDNode *NVPTXDAGToDAGISel::SelectLoad(SDNode *N) load instructions are handled.

Definitions for load and store operations on CUDA memory types (e.g. shared, global) in llvm

Answers (1)

Related Questions