David
David

Reputation: 272

How is the CUDA<<<...>>>() kernel launch syntax implemented

CUDA kernels are launched with this syntax (at least in the runtime API)

mykernel<<<blocks, threads, shared_mem, stream>>>(args);

Is this implemented as a macro or is it special syntax that nvcc removes before handing host code off to gcc?

Upvotes: 5

Views: 10837

Answers (1)

Robert Crovella
Robert Crovella

Reputation: 151899

The nvcc preprocessing system eventually converts it to a sequence of CUDA runtime library calls before handing the code off to the host code compiler for compilation. The exact sequence of calls may change depending on CUDA version.

You can inspect files using the --keep option to nvcc (and --verbose may help with understanding as well), and you can also see a trace of API calls issued for a kernel call using one of the profilers e.g. nvprof --print-api-trace ...

---EDIT---

Just to make this answer more concise, nvcc directly modifies the host code to replace the <<<...>>> syntax before passing it off to the host compiler (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#offline-compilation)

Upvotes: 9

Related Questions