Reputation: 982

How do I restrict the number of processors used by the ridge regression model in sklearn?

I want to make a fair comparison between different machine learning models. However, I find that the ridge regression model will automatically use multiple processors and there is no parameter that I can restrict the number of used processors (such as n_jobs). Is there any possible way to solve this problem?

A minimal example:

from sklearn.datasets import make_regression
from sklearn.linear_model import RidgeCV

features, target = make_regression(n_samples=10000, n_features=1000)
r = RidgeCV()
r.fit(features, target)
print(r.score(features, target))

Upvotes: 2

Answers (4)

PV8

Reputation: 6260

Based on the docs for RidgeCV:

Ridge regression with built-in cross-validation.

By default, it performs Leave-One-Out Cross-Validation, which is a form of efficient Leave-One-Out cross-validation.

And by default you use None - to use the efficient Leave-One-Out cross-validation.

An alternate approach with ridge regression and cross validation:

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge
clf = Ridge(a)
scores = cross_val_score(clf, features, target, cv=1, n_jobs=1)
print(scores)

See also the docs of Ridge and cross_val_score.

Upvotes: 0

amiola

Reputation: 3026

Trying to expand further on @PV8 answer, what happens whenever you instantiate an instance of RidgeCV() without explicitly setting cv parameter (as in your case) is that an Efficient Leave One Out cross-validation is run (according to the algorithms referenced here, implementation here).

On the other side, when explicitly passing cv parameter to RidgeCV() this happens:

  model = Ridge()
  parameters = {'alpha': [0.1, 1.0, 10.0]}
  gs = GridSearchCV(model, param_grid=parameters)
  gs.fit(features, target)
  print(gs.best_score_)

(as you can see here), namely that you'll use GridSearchCV with default n_jobs=None.

Most importantly, as pointed out by one of sklearn core-dev here, the issue you are experimenting might be not dependent on sklearn, but rather on

[...] your numpy setup performing vectorized operations with parallelism.

(where vectorized operations are performed within the computationally efficient LOO cross-validation procedure that you are implicitly calling by not passing cv to RidgeCV()).

Upvotes: 0

user2653663

Reputation: 2948

If you set the environmental variable OMP_NUM_THREADS to n, you will get the expected behaviour. E.g. on linux, do export OMP_NUM_THREADS=1 in the terminal to restrict the use to 1 cpu.

Depending on your system, you can also set it directly in python. See e.g. How to set environment variables in Python?

Upvotes: 3

Sunny

Reputation: 302

Here it is try to take a look here sklearn.utils.parallel_backend i think you can set up the number of cores for calculation using the njobs parameter.

Upvotes: -1

How do I restrict the number of processors used by the ridge regression model in sklearn?

Answers (4)

Related Questions