Avoid recompilation of numba code when using multiprocessing

Question

I've been using numba for doing multiprocessing.

The only problem - numba recompiles the code for every process separately.

(Its not much of a problem when the number of processes are equal to the number of physical CPUs, but a huge one when that is not the case!)

Is there any way to make numba compile code once, and then share the compiled artifacts across process boundaries?

Example -

from multiprocessing import Process
from time import time, sleep
from numba import njit


@njit
def child():
    pass


if __name__ == "__main__":
    ps = [Process(target=child) for _ in range(100)]
    for p in ps:
        p.start()
    s = time()
    for p in ps:
        p.join()
    print("compile time:", time() - s)

compile time: 19.10037922859192

CPU usage pegged @ 100% on all cores. I've tried numba's cache=True, but my code is unfortunately uncachable.

/Users/dev/PycharmProjects/trading/tradingdo/strategy.py:91: NumbaWarning: Cannot cache compiled function "_strategy1" as it uses dynamic globals (such as ctypes pointers and large global arrays)
  @njit

Dev Aggarwal · Accepted Answer

On systems with fork() support (Linux), this is easy -

Simply compile the function once, before starting the processes - this will make numba cache the compiler output, as it does normally.

But because of fork's copy-on-write magic, that cache automatically gets shared with child processes!

What's not so clear is how to do it on systems without proper fork() support. Can numba's cache be pickled?

from multiprocessing import Process
from time import time, sleep
from numba import njit


@njit
def child():
    pass


if __name__ == "__main__":
    child() # this will do it

    ps = [Process(target=child) for _ in range(100)]
    for p in ps:
        p.start()
    s = time()
    for p in ps:
        p.join()
    print("compile time:", time() - s)

compile time: 0.011722326278686523

Its also worth looking at numba's nogil. This can eliminate the need for Processes, and threads share numba compilation cache just fine

Avoid recompilation of numba code when using multiprocessing

Answers (1)

Related Questions