Lijin Durairaj
Lijin Durairaj

Reputation: 5240

How to pass different values to Pipeline Parameters

suppose if i am doing hyper parameter tuning to one of my model, lets say, i am using AdaBoostClassifier() and want to pass different base_estimator, so i pass SVC & DecisionTreeClassifier as estimator

_parameters=[
         {
                'mdl':[AdaBoostClassifier(random_state=23)],
                'mdl__learning_rate':np.linspace(0,1,20),
                'mdl__base_estimator':[SVC(),DecisionTreeClassifier()]
         }         
            ]

now, i want to pass different values to ccp_alpha of DecisionTreeClassifier, something like this

'mdl__base_estimator':[LinearRegression(),DecisionTreeClassifier(ccp_alpha=[0.1,0.2,0.3,0.4])]

how can i do that, i tried passing it like this, but it is not working, here is my entire code

pipeline=Pipeline(
    [
     ('scal',StandardScaler()),
     ('mdl','passthrough')
    ]
)

_parameters=[
              {
              'mdl':[DecisionTreeClassifier(random_state=42)]   ,
               'mdl__max_depth':np.linspace(2,30,2),
               'mdl__min_samples_split':np.linspace(1,10,1),
               'mdl__max_features':np.linspace(1,100,1),
               'mdl__ccp_alpha':np.linspace(0,1,10)
             }
          ,{
                'mdl':[AdaBoostClassifier(random_state=23)],
                'mdl__learning_rate':np.linspace(0,1,20),
                'mdl__base_estimator':[SVC(),DecisionTreeClassifier(ccp_alpha=[0.3,0.4,0.5,0.7])]
            }         
]

grid_search=GridSearchCV(_pipeline,_parameters,cv=3,n_jobs=-1,scoring='f1')
grid_search.fit(x,y

)

Upvotes: 0

Views: 187

Answers (1)

Ben Reiniger
Ben Reiniger

Reputation: 12698

This kind of splitting is why param_grid can be a list of dicts, as in your outer split; but it cannot easily handle the nested disjunction you have. Two approaches come to mind.

More disjoint grids:

_parameters=[
    {
        'mdl': [DecisionTreeClassifier(random_state=42)],
        'mdl__max_depth': np.linspace(2,30,2),
        'mdl__min_samples_split': np.linspace(1,10,1),
        'mdl__max_features': np.linspace(1,100,1),
        'mdl__ccp_alpha': np.linspace(0,1,10),
    },
    {
        'mdl': [AdaBoostClassifier(random_state=23)],
        'mdl__learning_rate': np.linspace(0,1,20),
        'mdl__base_estimator': [SVC()],
    },
    {
        'mdl': [AdaBoostClassifier(random_state=23)],
        'mdl__learning_rate': np.linspace(0,1,20),
        'mdl__base_estimator': [DecisionTreeClassifier()],
        'mdl__base_estimator__ccp_alpha': [0.3,0.4,0.5,0.7],
    },
]

Or list comprehension:

_parameters=[
    {
        'mdl': [DecisionTreeClassifier(random_state=42)],
        'mdl__max_depth': np.linspace(2,30,2),
        'mdl__min_samples_split': np.linspace(1,10,1),
        'mdl__max_features': np.linspace(1,100,1),
        'mdl__ccp_alpha': np.linspace(0,1,10),
    },
    {
        'mdl': [AdaBoostClassifier(random_state=23)],
        'mdl__learning_rate': np.linspace(0,1,20),
        'mdl__base_estimator': [SVC()] + [DecisionTreeClassifier(ccp_alpha=a) for a in [0.3,0.4,0.5,0.7]],
    },
]

Upvotes: 1

Related Questions