Reputation: 3257
My code looks something like this:
from joblib import Parallel, delayed
# prediction model - 10s of megabytes on disk
LARGE_MODEL = load_model('path/to/model')
file_paths = glob('path/to/files/*')
def do_thing(file_path):
pred = LARGE_MODEL.predict(load_image(file_path))
return pred
Parallel(n_jobs=2)(delayed(do_thing)(fp) for fp in file_paths)
My question is whether LARGE_MODEL
will be pickled/unpickled with each iteration of the loop. And if so, how can I make sure each worker caches it instead (if that's possible)?
Upvotes: 6
Views: 1547
Reputation: 38942
TLDR
The parent process pickles large model once. That can be made more performant by ensuring large model is a numpy array backed to a memfile. Workers can
load_temporary_memmap
much faster than from disk.
Your job is parallelized and likely to be using joblibs._parallel_backends.LokyBackend
.
In joblib.parallel.Parallel.__call__
, joblib tries to initialize the backend to use LokyBackend
when n_jobs
is set to a count greater than 1.
LokyBackend
uses a shared temporary folder for the same Parallel
object. This is relevant for reducers that modify the default pickling behavior.
Now, LokyBackend
configures a MemmappingExecutor
that shares this folder to the reducers
.
If you have numpy installed and your model is a clean numpy array, you are guaranteed to have it pickled once as a memmapped file using the ArrayMemmapForwardReducer and passed from parent to child processes.
Otherwise it is pickled using the default pickling as a bytes
object.
You can know how your model is pickled in the parent process reading the debug logs from joblib.
Each worker 'unpickles' large model so there is really no point in caching the large model there.
You can only improve the source from where the pickled large model is loaded from in the workers by backing your models as a memory mapped file.
Upvotes: 5