David
David

Reputation: 121

Does n_jobs=-1 in scikit-learn use all cores? Or all available cores?

I am using RandomForestRegressor and I want to use the largest number of trees without adding to the total runtime. My dubious assumption is that if my computer has 100 cores, by specifying a number of trees that is a multiple of 100, I am getting the most bang for my buck. Is this necessarily true?

The regression task is being performed within a hyperparameter optimization procedure, and since I specified n_jobs=1 for this tuning procedure, I am not sure if the number of trees should actually be a multiple of 99 since at least one core may be occupied.

The post How does scikit-learn handle multiple n_jobs arguments? is similar but the top answer quotes the documentation which states that n_jobs=-1 corresponds to "all processors". Again, I'm not sure if this literally means all processors, or all available processors (or if these would be the same in my situation).

Upvotes: 0

Views: 2269

Answers (1)

Ahmed AEK
Ahmed AEK

Reputation: 17496

using n_jobs=-1 will spawn as many workers as the return of multiprocessing.cpu_count(), which is the number of cores (real + logical) in your system. (which takes into account multi-sockets)

let's say you have 8 cores, and 16 thread processor, you will have 16 workers spawned that will do work for you in parallel.

the OS decides which core each worker will use, but usually it will just distribute each worker across a core if there are as many cores as workers.

the number of models doesn't have to be a multiple of that number, for many reasons, which include:

  1. you will never really reach 100% utilization nor get an 8x speedup from using the entire 8 cores (or even 16 cores) due to how computers work under load
  2. training models is usually limited by your memory bandwidth if anything

so you don't really need to worry that much about the number of workers or tasks, the real bottleneck isn't here anyway.

just use -1 if you pc has enough memory for it, if -1 is causing your PC to run out of memory (and will then run even slower) then just keep decreasing that number till you are nearly out of memory (but don't reach the "out of memory" state, as your PC will start using disk as memory, which is much slower).

manually putting a number higher than your core count is only going to slow down the work, and consume more memory than necessary. higher number of workers isn't always faster and will start getting much slower than a low number of workers after a certain number.

if the resulting performance is not good enough then you should look into distributed computing as outlined in the answer you linked, as it is the basic workaround for bandwidth limited tasks, as you get access to "more RAM sticks"

Upvotes: 1

Related Questions