Reputation: 1395
How does GridSearchCV with n_jobs being set to a >1 value actually work. Does it create multiple instances of the classifier for each node(computation node) or does it create 1 single classifier which is shared by all the nodes. The reason I am asking is becuase I am using vowpal_wabbits Python wrapper: https://github.com/josephreisinger/vowpal_porpoise/blob/master/vowpal_porpoise/vw.py and see that it opens a subprocess (with stdin, stdout, stderr etc). However when I use this from GridSearch with n_jobs > 1 , I get a broken pipe error after some time and am trying to understand why?
Upvotes: 0
Views: 11061
Reputation: 1266
One of the Questions in the comments was is
Which one is better, to use n_jobs=-1 or n_jobs with a big number like 32 ?!
This depends on your understanding of better. I would say, that this depends on your hardware currently available as well as how much you want to provide of it to the algorithm.
The documentation says that n_jobs=-1
uses all processors (for instance threads). Therefore, if your hardware actually supports 32 Threads, the function GridSearchCV()
will use 32 of the processors. And if you decrease the number further (n_jobs=-2
, n_jobs=-3
and so forth) you will allocate the number of possible processors minus the number you decreased the parameter. As an example when 8 jobs would be possible, then 7 jobs will be instanciated when n_jobs=-2
.
But it is also a little bit more complicated than this: The number of jobs specified with n_jobs
in GridSearchCV()
does not have to be identical to the actual threads used by Python because there may be other sources that use processors 2.
Upvotes: 1
Reputation: 40169
n_jobs > 1
will make GridSearchCV
use Python's multiprocessing module under the hood. That means that the original estimator instance will be copied (pickled) to be send over to the worker Python processes. All scikit-learn models MUST be picklable. If the vowpal_porpoise
opens pipes to a vw subprocess in the constructor object, it has to close them and reopen them around the pickling / unpickling steps by defining custom __getstate__
and __setstate__
methods. Have a look at the Python documentation for more details.
The subprocess should probably be close and reopened upon the call to the set_params
method to update the parameters of the model with new parameter values.
It would be easier to not open the subprocess in the constructor and just open it on demand in the fit and predict methods and close the subprocess each time.
Upvotes: 4