Reputation: 31
I would like to use scikit-learn LassoCV/RidgeCV while applying a 'StandardScaler' on each fold training set. I do not want to apply the scaler before the cross-validation to avoid leakage but I cannot figure out how I am supposed to do that with LassoCV/RidgeCV.
Is there a way to do this ? Or should I create a pipeline with Lasso/Ridge and 'manually' search for the hyperparameters (using GridSearchCV for instance) ?
Many thanks.
Upvotes: 1
Views: 1129
Reputation: 33147
If you want to apply the scaling to each iteration in cross-validation, you could use the make_pipeline function (this function will call "fit" on each training fold and call "transform" on each test fold)
The make_my_pipe below can be considered as an esitmator with a StandardScaler attached to it.
code:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.cross_validation import cross_val_score
from sklearn.linear_model import Ridge
X = "some data"
y = "the labels of the data"
make_my_pipe = make_pipeline(StandardScaler(), Ridge())
scores = cross_val_score(pipe, X, y)
print(scores)
Upvotes: 0
Reputation: 31
I got the answer through the scikit-learn mailing list so here it is:
'There is no way to use the "efficient" EstimatorCV objects with pipelines. This is an API bug and there's an open issue and maybe even a PR for that.'
Many thanks to Andreas Mueller for the answer.
Upvotes: 2