How does joblib.Parallel deal with global variables?

Question

My code looks something like this:

from joblib import Parallel, delayed

# prediction model - 10s of megabytes on disk
LARGE_MODEL = load_model('path/to/model')

file_paths = glob('path/to/files/*')

def do_thing(file_path):
  pred = LARGE_MODEL.predict(load_image(file_path))
  return pred

Parallel(n_jobs=2)(delayed(do_thing)(fp) for fp in file_paths)

My question is whether LARGE_MODEL will be pickled/unpickled with each iteration of the loop. And if so, how can I make sure each worker caches it instead (if that's possible)?

Oluwafemi Sule · Accepted Answer

TLDR

The parent process pickles large model once. That can be made more performant by ensuring large model is a numpy array backed to a memfile. Workers can load_temporary_memmap much faster than from disk.

Your job is parallelized and likely to be using joblibs._parallel_backends.LokyBackend.

In joblib.parallel.Parallel.__call__, joblib tries to initialize the backend to use LokyBackend when n_jobs is set to a count greater than 1.

LokyBackend uses a shared temporary folder for the same Parallel object. This is relevant for reducers that modify the default pickling behavior.

Now, LokyBackend configures a MemmappingExecutor that shares this folder to the reducers.

If you have numpy installed and your model is a clean numpy array, you are guaranteed to have it pickled once as a memmapped file using the ArrayMemmapForwardReducer and passed from parent to child processes.

Otherwise it is pickled using the default pickling as a bytes object.

You can know how your model is pickled in the parent process reading the debug logs from joblib.

Each worker 'unpickles' large model so there is really no point in caching the large model there.

You can only improve the source from where the pickled large model is loaded from in the workers by backing your models as a memory mapped file.

How does joblib.Parallel deal with global variables?

Answers (1)

Related Questions