Does it make sense to use n_jobs = -1 both for inner and outer loop?

I want to parallelize my model-building-procedure using scikit-learn. I wonder if it makes sense to parallelize both the outer and inner loop ( i.e. setting n_jobs = -1 both for GridSearchCV and for cross_validate)?

Upvotes: 2

Answers (1)

user3666197

Reputation: 1

A short version: No, it does not.

A longer version will need to take a bit of understanding, how the n_jobs are actually being handled.

Having a few, expensive, resources ( right, the CPU-cores per se, the fastest and the most expensive CPU-core-local Cache hierarchy elements ( not going as deep to study cache-lines and their respective associativity at this level ) and the less expensive and also way slower RAM-memory ), the n_jobs = -1 directive, in the first call-signature executed, will simply grab all these resources at once.

That means, there will be no reasonably "free" resources for any "deeper" levels of attempt to use -again- "as many resources" as physically available ( which the n_jobs = -1 does and obeys that again, but having no "free" left unharnessed from the first one, there will become just a wreck havoc in scheduling attempts to map/evict/map/evict/map/evict thus more processing jobs on the same real ( and already pretty busy ) hardware elements ).

Often even the first attempt may create troubles on the RAM-allocations side, as large models will require that many replications in all the RAM-data-structures during the process instantiations ( a whole copy is effectively made with all objects, used or not used, replicated into each new process ), as the number of CPU-cores "dictates". Resulting memory swaps are definitely a thing you will never like to repeat.

Enjoy the model HyperParameters' tuning - it is the Creame a la Creame of the Machine Learning practice. Worth being good at.

Upvotes: 2

Does it make sense to use n_jobs = -1 both for inner and outer loop?

Answers (1)

A short version: No, it does not.

Related Questions