Reputation: 648
Let's consider a multivariate regression problem (2 response variables: Latitude and Longitude). Currently, a few machine learning model implementations like Support Vector Regression sklearn.svm.SVR
do not currently provide naive support of multivariate regression. For this reason, sklearn.multioutput.MultiOutputRegressor
can be used.
from sklearn.multioutput import MultiOutputRegressor
svr_multi = MultiOutputRegressor(SVR(),n_jobs=-1)
#Fit the algorithm on the data, y_train)
y_pred= svr_multi.predict(X_test)
My goal is to tune the parameters of SVR
by sklearn.model_selection.GridSearchCV
. Ideally, if the response was a single variable and not multiple, I would perform an operation as follows:
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
pipe_svr = (Pipeline([('scl', StandardScaler()),
('reg', SVR())]))
grid_param_svr = {
'reg__C': [0.01,0.1,1,10],
'reg__epsilon': [0.1,0.2,0.3],
'degree': [2,3,4]
gs_svr = (GridSearchCV(estimator=pipe_svr,
scoring = 'neg_mean_squared_error',
n_jobs = -1))
gs_svr =,y_train)
However, as my response y_train
is 2-dimensional, I need to use the MultiOutputRegressor
on top of SVR. How can I modify the above code to enable this GridSearchCV operation? If not possible, is there a better alternative?
Upvotes: 29
Views: 15229
Reputation: 1639
Thank you, Marco.
Adding to your answer here is a short illustrative example of a Randomized Search applied to a Multi-Ouput GradientBoostingRegressor.
from sklearn.datasets import load_linnerud
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.model_selection import RandomizedSearchCV
x, y = load_linnerud(return_X_y=True)
model = MultiOutputRegressor(GradientBoostingRegressor(loss='ls', learning_rate=0.1, n_estimators=100, subsample=1.0,
criterion='friedman_mse', min_samples_split=2,
min_weight_fraction_leaf=0.0, max_depth=3,
min_impurity_split=None, init=None, random_state=None,
alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False,
validation_fraction=0.1, n_iter_no_change=None, tol=0.0001,
hyperparameters = dict(estimator__learning_rate=[0.05, 0.1, 0.2, 0.5, 0.9], estimator__loss=['ls', 'lad', 'huber'],
estimator__n_estimators=[20, 50, 100, 200, 300, 500, 700, 1000],
estimator__criterion=['friedman_mse', 'mse'], estimator__min_samples_split=[2, 4, 7, 10],
estimator__max_depth=[3, 5, 10, 15, 20, 30], estimator__min_samples_leaf=[1, 2, 3, 5, 8, 10],
estimator__min_impurity_decrease=[0, 0.2, 0.4, 0.6, 0.8],
estimator__max_leaf_nodes=[5, 10, 20, 30, 50, 100, 300])
randomized_search = RandomizedSearchCV(model, hyperparameters, random_state=0, n_iter=5, scoring=None,
n_jobs=2, refit=True, cv=5, verbose=True,
pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)
hyperparameters_tuning =, y)
print('Best Parameters = {}'.format(hyperparameters_tuning.best_params_))
tuned_model = hyperparameters_tuning.best_estimator_
Upvotes: 8
Reputation: 438
For use without pipeline, put estimator__
before parameters:
param_grid = {'estimator__min_samples_split':[10, 50],
'estimator__min_samples_leaf':[50, 150]}
gb = GradientBoostingRegressor()
gs = GridSearchCV(MultiOutputRegressor(gb), param_grid=param_grid),y)
Upvotes: 31
Reputation: 648
I just found a working solution. In the case of nested estimators, the parameters of the inner estimator can be accessed by estimator__
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
pipe_svr = Pipeline([('scl', StandardScaler()),
('reg', MultiOutputRegressor(SVR()))])
grid_param_svr = {
'reg__estimator__C': [0.1,1,10]
gs_svr = (GridSearchCV(estimator=pipe_svr,
scoring = 'neg_mean_squared_error',
n_jobs = -1))
gs_svr =,y_train)
Pipeline(steps=[('scl', StandardScaler(copy=True, with_mean=True, with_std=True)),
('reg', MultiOutputRegressor(estimator=SVR(C=10, cache_size=200,
coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1,
shrinking=True, tol=0.001, verbose=False), n_jobs=1))])
Upvotes: 27