Ginger
Ginger

Reputation: 8650

Cross Validation, scikit-learn, parallel is slower

I am learning how to use scikit-learn.

When testing the cross validation function, if I turn on parallel computing using

cross_validation.cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=-1)

the result is a lot slower than if I turn it off using

cross_validation.cross_val_score(svc, X_digits, y_digits, cv=kfold, n_jobs=1)

How can I stop this?

I am using PyDev, Anacondas 3.3 on a 64bit Windows 7 machine. From looking at Task Manager, it appears that the performance hit is caused by many instances of Python being started and stopped. Why do they not start, and stay started?

Upvotes: 2

Views: 3023

Answers (2)

Nikolay Petrov
Nikolay Petrov

Reputation: 88

You can try using accelerated implementations of algorithms - such as scikit-learn-intelex - https://github.com/intel/scikit-learn-intelex

For SVM you for sure would be able to get higher compute efficiency, however for such large datasets this still would be noticeable.

First install package

pip install scikit-learn-intelex

And then add in your python script

from sklearnex import patch_sklearn
patch_sklearn()

Upvotes: 0

Fred Foo
Fred Foo

Reputation: 363787

Why do they not start, and stay started?

Because that's not how the multiprocessing module in Python works at present, and that's what scikit-learn uses internally. In Python 3.4, this will be fixed at least for POSIX (Linux, Mac OS X) platforms. I don't believe the CPython developers also intend to fix this for Windows. Light-weight parallel processing for scikit-learn is in the works, but a release is still some time away.

Upvotes: 1

Related Questions