Amin Kiany
Amin Kiany

Reputation: 800

Hyperparameter tuning for StackingRegressor sklearn

In my problem, I would like to tune sklearn.ensemble.StackingRegressor using a simple RandomizedSearchCV tuner. Since we need to define estimators while instantiating StackingRegressor(), I couldn't defined parameter space for estimators in my param_distribution randomizedsearch properly.

I tried the following and I faced with error:

from sklearn.datasets import load_diabetes
from sklearn.linear_model import RidgeCV
from sklearn.svm import LinearSVR
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor, 
GradientBoostingRegressor
from sklearn.ensemble import StackingRegressor
X, y = load_diabetes(return_X_y=True)

rfr = RandomForestRegressor()
gbr = GradientBoostingRegressor()

estimators = [rfr, gbr]
sreg = StackingRegressor(estimators=estimators)
params = {'rfr__max_depth': [3, 5, 10, 100],
          'gbr__max_depth': [3, 5, 10, 100]}

grid = RandomizedSearchCV(estimator=sreg, 
                          param_distributions=params,
                          cv=3)
grid.fit(X,y)

and I faced with errors AttributeError: 'RandomForestRegressor' object has no attribute 'estimators_'.

Is there anyway to tune parameters of different estimators within the StackingRegressor?

Upvotes: 0

Views: 1575

Answers (1)

user11989081
user11989081

Reputation: 8663

If you define your estimators as a list of tuples of estimator names and estimator instances as shown below your code should work.

import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.ensemble import StackingRegressor

X, y = load_diabetes(return_X_y=True)

rfr = RandomForestRegressor()
gbr = GradientBoostingRegressor()

estimators = [('rfr', rfr), ('gbr', gbr)]

sreg = StackingRegressor(estimators=estimators)

params = {
    'rfr__max_depth': [3, 5],
    'gbr__max_depth': [3, 5]
}

grid = RandomizedSearchCV(
    estimator=sreg,
    param_distributions=params,
    n_iter=2,
    cv=3,
    verbose=1,
    random_state=100
)

grid.fit(X, y)

res = pd.DataFrame(grid.cv_results_)
print(res)
#    mean_fit_time  std_fit_time  ...  std_test_score  rank_test_score
# 0       1.121728      0.024188  ...        0.024546                2
# 1       1.096936      0.034377  ...        0.013047                1

Upvotes: 2

Related Questions