Reputation: 982
I want to make a fair comparison between different machine learning models. However, I find that the ridge regression model will automatically use multiple processors and there is no parameter that I can restrict the number of used processors (such as n_jobs). Is there any possible way to solve this problem?
A minimal example:
from sklearn.datasets import make_regression
from sklearn.linear_model import RidgeCV
features, target = make_regression(n_samples=10000, n_features=1000)
r = RidgeCV()
r.fit(features, target)
print(r.score(features, target))
Upvotes: 2
Views: 1064
Reputation: 6260
Based on the docs for RidgeCV
:
Ridge regression with built-in cross-validation.
By default, it performs Leave-One-Out Cross-Validation, which is a form of efficient Leave-One-Out cross-validation.
And by default you use None
- to use the efficient Leave-One-Out cross-validation.
An alternate approach with ridge regression
and cross validation
:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge
clf = Ridge(a)
scores = cross_val_score(clf, features, target, cv=1, n_jobs=1)
print(scores)
See also the docs of Ridge and cross_val_score.
Upvotes: 0
Reputation: 3026
Trying to expand further on @PV8 answer, what happens whenever you instantiate an instance of RidgeCV()
without explicitly setting cv
parameter (as in your case) is that an Efficient Leave One Out cross-validation is run (according to the algorithms referenced here, implementation here).
On the other side, when explicitly passing cv
parameter to RidgeCV()
this happens:
model = Ridge()
parameters = {'alpha': [0.1, 1.0, 10.0]}
gs = GridSearchCV(model, param_grid=parameters)
gs.fit(features, target)
print(gs.best_score_)
(as you can see here), namely that you'll use GridSearchCV
with default n_jobs=None
.
Most importantly, as pointed out by one of sklearn
core-dev here, the issue you are experimenting might be not dependent on sklearn
, but rather on
[...] your numpy setup performing vectorized operations with parallelism.
(where vectorized operations are performed within the computationally efficient LOO cross-validation procedure that you are implicitly calling by not passing cv
to RidgeCV()
).
Upvotes: 0
Reputation: 2948
If you set the environmental variable OMP_NUM_THREADS
to n
, you will get the expected behaviour. E.g. on linux, do export OMP_NUM_THREADS=1
in the terminal to restrict the use to 1 cpu.
Depending on your system, you can also set it directly in python. See e.g. How to set environment variables in Python?
Upvotes: 3
Reputation: 302
Here it is try to take a look here sklearn.utils.parallel_backend i think you can set up the number of cores for calculation using the njobs parameter.
Upvotes: -1