Clash Secondo
Clash Secondo

Reputation: 29

Multiple param in imblearn Pipeline

I have a strongly unbalanced dataset, and I want to try different Shape Model for SVM classification and different smooting "random" and "smote" I made this, but is there a more compact way to do this?

 DecisionShapeModel = ['ovo', 'ovr']
    Smoothing = [SMOTE(k_neighbors=3),RandomOverSampler()]
    ListParam = itertools.product(DecisionShapeModel, Smoothing)

    for DecisionShapeModel, Smoothing in ListParam:

        model = SVC(decision_function_shape=DecisionShapeModel, probability=True)

        PipelineIMB = Pipeline([
            ('smote', Smoothing),
            ('svm', model)
        ])

        """ Define search space """
        param_grid = {
            'svm__C': np.arange(1, 20, 1),
            'svm__kernel': ['linear', 'poly', 'rbf', 'sigmoid']
        }

        kf = KFold(n_splits=10, random_state=42, shuffle=True)

        grid_imba = GridSearchCV(PipelineIMB, param_grid, cv=kf, scoring='f1_macro',
                                 verbose=10, n_jobs=-1, error_score='raise')

        grid_imba.fit(X_train, y_train)
        y_pred = grid_imba.predict(X_test)

Upvotes: 0

Views: 152

Answers (1)

Ben Reiniger
Ben Reiniger

Reputation: 12698

If your goal is to test those options and only leave the best as a refitted model, then you can pack everything into the GridSearchCV, making use of the ability to replace entire steps in the parameter grid.

model = SVC(probability=True)

PipelineIMB = Pipeline([
    ('sample', 'passthrough'),
    ('svm', model),
])

param_grid = {
    'sample': [
        SMOTE(k_neighbors=3),
        RandomOverSampler(),
    ],  # these replace the original definition of "passthrough"
    'svm__decision_function_shape': ['ovo', 'ovr'],
    'svm__C': np.arange(1, 20, 1),
    'svm__kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
}

kf = KFold(n_splits=10, random_state=42, shuffle=True)

grid_imba = GridSearchCV(PipelineIMB, param_grid, cv=kf, scoring='f1_macro',
                         verbose=10, n_jobs=-1, error_score='raise')

grid_imba.fit(X_train, y_train)

Upvotes: 1

Related Questions