Vasco Sá
Vasco Sá

Reputation: 49

Issues with multiple jobs when using RandomizedSearchCV

I am trying to run a RandomizedSearchCV with a nested RFECV which itself has a pipeline. This pipeline is a MinMaxScaler followed by one of 6 different classifiers. When I run this RandzomizedSearchCV with n_jobs=1, everything works fine, but when I try to increase the number of jobs, I run into issues:

For instance, and SVC will show a FitFailedWarning on many of the folds:

[CV] ................. estimator__clf__C=0.1, score=nan, total=   2.9s
[Parallel(n_jobs=10)]: Done  12 tasks      | elapsed:    7.3s
[CV] estimator__clf__C=1 .............................................
C:\Users\user\Anaconda3\envs\npai_python37\lib\site-packages\sklearn\model_selection\_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
sklearn.exceptions.NotFittedError: This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

For the SVM, it will work despite the warnings. However, for a DecisionTreeClassifier it'll simply terminate the job:

joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

Any help as to why this might be happening when I try to use more than one job would be very much appreciated.

Thank you!

Upvotes: 2

Views: 771

Answers (1)

Vasco Sá
Vasco Sá

Reputation: 49

Ok, so it turns out the issue was very simple. I was setting n_jobs both in RFECV and in RandomizedSearchCV, which was causing all the errors. I guess nesting parallel jobs is not a good idea.

I also changed the code so that the pipeline is the last step, with the randomized search being nested inside the pipeline (as opposed to the pipeline being nested all the way in the RFECV). This saves computation on the minmaxscaler step of the pipeline, which is always the same.

Upvotes: 3

Related Questions