Reputation: 95
A while ago, I asked the following question:
Numba Cuda computation seems to be slower than sequential run. Did I do obvious mistakes?
It was about using the GPU for a simple calculation. I noticed that the calculation took very long. I received the answer that the problem was that the compilation was done at runtime and one can force to compile ahead of time by providing the data types after the cuda.jit-decorator. This worked very well. However, I would like to read more about this and asked myself if there exists an official source. I know that there is a section about ahead-of-time-compilation in the numba docs, but as far as I understood, the way of doing ahead-of-time-compilation described there involves more steps, e.g. the creation of a seperate module. As a result, it seems that a file is exported, which has to be imported later on in order to avoid just-in-time-compilation.
Obviously it is also possible to skip this file export etc. by simply defining the data types. If this is also explained in the docs or in another official source, could somebody point me to it?
Upvotes: 0
Views: 215
Reputation: 50348
Eager compilation should not be confused with ahead of time compilation. The former compile the function dynamically using the JIT compiler when the function is defined (to be more precise, when the decorator function is executed). The later compile the function once using a static compiler (PyCC) producing a module that is then imported by other scripts. PyCC compilation has additional restriction (eg. AFAIK prange
is not supported nor fastmath
and I doubt CUDA is supported too). The export step is mandatory for ahead of time compilation (since not JIT compiler are used). Note that you can cache eagerly compiled functions using the flag cache=True
in the Numba decorator (so the function should be compiled once on the target platform).
PS: I clarified the linked question (which previously wrongly mentioned ahead of time compilation).
Upvotes: 2