Reputation: 8650
I am learning how to use scikit-learn.
When testing the cross validation function, if I turn on parallel computing using
cross_validation.cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=-1)
the result is a lot slower than if I turn it off using
cross_validation.cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=1)
How can I stop this?
I am using PyDev, Anacondas 3.3 on a 64bit Windows 7 machine. From looking at Task Manager, it appears that the performance hit is caused by many instances of Python being started and stopped. Why do they not start, and stay started?
Upvotes: 2
Views: 3023
Reputation: 88
You can try using accelerated implementations of algorithms - such as scikit-learn-intelex - https://github.com/intel/scikit-learn-intelex
For SVM you for sure would be able to get higher compute efficiency, however for such large datasets this still would be noticeable.
First install package
pip install scikit-learn-intelex
And then add in your python script
from sklearnex import patch_sklearn
patch_sklearn()
Upvotes: 0
Reputation: 363787
Why do they not start, and stay started?
Because that's not how the multiprocessing
module in Python works at present, and that's what scikit-learn uses internally. In Python 3.4, this will be fixed at least for POSIX (Linux, Mac OS X) platforms. I don't believe the CPython developers also intend to fix this for Windows. Light-weight parallel processing for scikit-learn is in the works, but a release is still some time away.
Upvotes: 1