SKLearn Error with Pipeline and Gridsearch

Question

I would like to first split my data in a test and train set. Then I want to use GridSearchCV on my training set (internally split into train/validation set). In the end I want to collect all the testdata and do some other things (not in the scope of the question).

I have to scale my data. So I want to handle this problem in a pipeline. Some things in my SVC should be ficed (kernel='rbf', class_weight=...). When I run the code the following occurs:

"ValueError: Invalid parameter estimator for estimator Pipeline"

I don't understand what I'm doing wrong. I tried to follow this thread: StandardScaler with Pipelines and GridSearchCV

The only difference is, that I fix some parameters in my SVC. How can I handle this?

target = np.array(target).ravel()
loo = LeaveOneOut()
loo.get_n_splits(input)
    # Outer Loop
for train_index, test_index in loo.split(input):    
        X_train, X_test = input[train_index], input[test_index]
        y_train, y_test = target[train_index], target[test_index]
        p_grid = {'estimator__C': np.logspace(-5, 2, 20),}
                  'estimator__gamma': np.logspace(-5, 3, 20)}

        SVC_Kernel = SVC(kernel='rbf', class_weight='balanced',tol=10e-4, max_iter=200000, probability=False)
        pipe_SVC = Pipeline([('scaler',  RobustScaler()),('SVC', SVC_Kernel)])  
        n_splits = 5
        scoring = "f1_micro"

        inner_cv = StratifiedKFold(n_splits=n_splits,
                         shuffle=True, random_state=5)
        clfSearch = GridSearchCV(estimator=pipe_SVC, param_grid=p_grid,
                                 cv=inner_cv, scoring='f1_micro', iid=False, n_jobs=-1)

        clfSearch.fit(X_train, y_train)



        print("Best parameters set found on validation set for Support Vector Machine:")
        print()
        print(clfSearch.best_params_)
        print()
        print(clfSearch.best_score_)
        print("Grid scores on validation set:")
        print()

I also tried it this way:

p_grid = {'estimator__C': np.logspace(-5, 2, 20),
              'estimator__gamma': np.logspace(-5, 3, 20),
              'estimator__tol': [10e-4],
              'estimator__kernel': ['rbf'],
              'estimator__class_weight': ['balanced'],
              'estimator__max_iter':[200000],
              'estimator__probability': [False]}

SVC_Kernel = SVC()

This also doesn't work.

Shihab Shahriar Khan · Accepted Answer

The problem is in your p_grid. You are grid searching on your Pipeline, and that doesn't have anything called estimator. It does have something called SVC, so if you want to set that SVC's parameter, you should prefix you keys with SVC__ instead of estimator__. So replace p_grid with:

p_grid = {'SVC__C': np.logspace(-5, 2, 20),}
          'SVC__gamma': np.logspace(-5, 3, 20)}

Also, you can replace your outer for loop using cross_validate function.

SKLearn Error with Pipeline and Gridsearch

Answers (1)

Related Questions