Reputation: 272
CUDA kernels are launched with this syntax (at least in the runtime API)
mykernel<<<blocks, threads, shared_mem, stream>>>(args);
Is this implemented as a macro or is it special syntax that nvcc removes before handing host code off to gcc?
Upvotes: 5
Views: 10837
Reputation: 151899
The nvcc
preprocessing system eventually converts it to a sequence of CUDA runtime library calls before handing the code off to the host code compiler for compilation. The exact sequence of calls may change depending on CUDA version.
You can inspect files using the --keep
option to nvcc
(and --verbose
may help with understanding as well), and you can also see a trace of API calls issued for a kernel call using one of the profilers e.g. nvprof --print-api-trace ...
---EDIT---
Just to make this answer more concise, nvcc directly modifies the host code to replace the <<<...>>> syntax before passing it off to the host compiler (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#offline-compilation)
Upvotes: 9