Redratz
Redratz

Reputation: 136

Simultaneous feature selection and hyperparameter tuning

I'm trying to conduct both hyperparameter tuning and feature selection on a sklearn SVC model.

I tried the below code, but am getting an error which I have included.

clf = Pipeline([('anova', SelectPercentile(f_classif)),
                ('svc',  SVC( probability = True))])

score_means = list()
score_params = list()
percentiles = (1, 3, 6, 10, 15, 20, 30, 40, 60, 80, 100)

params = {
    "C": np.logspace(-3, 17, 21),
    "gamma": np.logspace(-20, 1, 21),
    'class_weight' : [None, 'balanced']
    }

halving_search = HalvingGridSearchCV(estimator = clf,
                                     param_grid = params,
                                     scoring = 'neg_brier_score',
                                     factor = 2, 
                                     
                                     verbose = 2,
                                     cv = 2)


for percentile in percentiles:
    clf.set_params(anova__percentile=percentile)
    this_scores = halving_search.fit(x_train, y_train)
    score_means.append(this_scores.best_score_)
    score_params.append(this_scores.best_params)

Running the pipeline code with a cross_val_score separate from the HalvingGridSearchCV works, but I want to conduct both feature selection and hyperparameter tuning to find which combination of features and hyperparameters produces the best model.

When I run the above code, I get the following error:

Traceback (most recent call last):

  File "<ipython-input-83-cf714445297c>", line 4, in <module>
    this_scores = halving_search.fit(x_train, y_train)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search_successive_halving.py", line 213, in fit
    super().fit(X, y=y, groups=None, **fit_params)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 841, in fit
    self._run_search(evaluate_candidates)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search_successive_halving.py", line 320, in _run_search
    more_results=more_results)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 809, in evaluate_candidates
    enumerate(cv.split(X, y, groups))))

  File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):

  File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 572, in __init__
    self.results = batch()

  File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]

  File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
    return self.function(*args, **kwargs)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 581, in _fit_and_score
    estimator = estimator.set_params(**cloned_parameters)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 150, in set_params
    self._set_params('steps', **kwargs)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\metaestimators.py", line 54, in _set_params
    super().set_params(**params)

  File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\base.py", line 233, in set_params
    (key, self))

ValueError: Invalid parameter C for estimator Pipeline(steps=[('anova', SelectPercentile(percentile=1)),
                ('svc', SVC(probability=True))]). Check the list of available parameters with `estimator.get_params().keys()`.

It reads like the halvingsearch is trying to pass the pipeline as an input for C.

Upvotes: 0

Views: 495

Answers (1)

afsharov
afsharov

Reputation: 5164

You want to perform a grid search over a Pipeline object. When defining the parameters for the different steps of the pipeline, you have to use the <step>__<parameter> syntax:

params = {
    "svc__C": np.logspace(-3, 17, 21),
    "svc__gamma": np.logspace(-20, 1, 21),
    "svc__class_weight" : [None, 'balanced']
}

See the user guide for more information.

Upvotes: 1

Related Questions