Reputation: 8526
I've been using numba for doing multiprocessing.
The only problem - numba recompiles the code for every process separately.
(Its not much of a problem when the number of processes are equal to the number of physical CPUs, but a huge one when that is not the case!)
Is there any way to make numba compile code once, and then share the compiled artifacts across process boundaries?
Example -
from multiprocessing import Process
from time import time, sleep
from numba import njit
@njit
def child():
pass
if __name__ == "__main__":
ps = [Process(target=child) for _ in range(100)]
for p in ps:
p.start()
s = time()
for p in ps:
p.join()
print("compile time:", time() - s)
compile time: 19.10037922859192
CPU usage pegged @ 100% on all cores. I've tried numba's cache=True, but my code is unfortunately uncachable.
/Users/dev/PycharmProjects/trading/tradingdo/strategy.py:91: NumbaWarning: Cannot cache compiled function "_strategy1" as it uses dynamic globals (such as ctypes pointers and large global arrays)
@njit
Upvotes: 3
Views: 1798
Reputation: 8526
On systems with fork()
support (Linux), this is easy -
Simply compile the function once, before starting the processes - this will make numba cache the compiler output, as it does normally.
But because of fork's copy-on-write magic, that cache automatically gets shared with child processes!
What's not so clear is how to do it on systems without proper fork()
support. Can numba's cache be pickled?
from multiprocessing import Process
from time import time, sleep
from numba import njit
@njit
def child():
pass
if __name__ == "__main__":
child() # this will do it
ps = [Process(target=child) for _ in range(100)]
for p in ps:
p.start()
s = time()
for p in ps:
p.join()
print("compile time:", time() - s)
compile time: 0.011722326278686523
Its also worth looking at numba's nogil. This can eliminate the need for Processes, and threads share numba compilation cache just fine
Upvotes: 2