Compiling CUDA PTX to binary for an older target

Question

From the question it is known that PTX is portable across various architectures. I believe this allows the migration going forward ex: sm_20 to sm_30. I have a special use case to go from sm_20 to sm_10. So is it possible to generate binary such as cubin for sm_10 target with a PTX compiled for sm_20 target.

Tim · Accepted Answer

PTX is forward compatible when compiled against a specific architecture (i.e., using the sm_* flag), but it is not backward compatible. One way to get over this is by specifying a particular virtual architecture and then generating binary images for all real architectures you want to target. For example,

nvcc -arch=compute_20 -code=sm_20,sm_30,sm_35

generates PTX for the compute 2.0 virtual architecture and generates binary images for 2.0, 3.0, and 3.5 devices. Please note that compute 1.0 is deprecated as of CUDA 7.0. This is known as the fat binary approach.

See the code generation options for the difference between real and virtual architectures.

EDIT: Actually, it's a bit redundant to specify -arch=compute_35 and -code=sm_35 because the JIT compiler would have intervened and built it for you. As long as you don't mind a little extra fat in your fat binary, then I suppose it doesn't matter too much.

EDIT2: code must be greater than or equal to arch because PTX is not backwards compatible. Thanks to Robert Crovella for pointing out that stupid mistake.

Compiling CUDA PTX to binary for an older target

Answers (1)

Related Questions