Reputation: 136
I'm trying to conduct both hyperparameter tuning and feature selection on a sklearn SVC model.
I tried the below code, but am getting an error which I have included.
clf = Pipeline([('anova', SelectPercentile(f_classif)),
('svc', SVC( probability = True))])
score_means = list()
score_params = list()
percentiles = (1, 3, 6, 10, 15, 20, 30, 40, 60, 80, 100)
params = {
"C": np.logspace(-3, 17, 21),
"gamma": np.logspace(-20, 1, 21),
'class_weight' : [None, 'balanced']
}
halving_search = HalvingGridSearchCV(estimator = clf,
param_grid = params,
scoring = 'neg_brier_score',
factor = 2,
verbose = 2,
cv = 2)
for percentile in percentiles:
clf.set_params(anova__percentile=percentile)
this_scores = halving_search.fit(x_train, y_train)
score_means.append(this_scores.best_score_)
score_params.append(this_scores.best_params)
Running the pipeline code with a cross_val_score separate from the HalvingGridSearchCV works, but I want to conduct both feature selection and hyperparameter tuning to find which combination of features and hyperparameters produces the best model.
When I run the above code, I get the following error:
Traceback (most recent call last):
File "<ipython-input-83-cf714445297c>", line 4, in <module>
this_scores = halving_search.fit(x_train, y_train)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search_successive_halving.py", line 213, in fit
super().fit(X, y=y, groups=None, **fit_params)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 841, in fit
self._run_search(evaluate_candidates)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search_successive_halving.py", line 320, in _run_search
more_results=more_results)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 809, in evaluate_candidates
enumerate(cv.split(X, y, groups))))
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 1041, in __call__
if self.dispatch_one_batch(iterator):
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 859, in dispatch_one_batch
self._dispatch(tasks)
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 777, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 208, in apply_async
result = ImmediateResult(func)
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 572, in __init__
self.results = batch()
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 263, in __call__
for func, args, kwargs in self.items]
File "C:\Users\fredd\Anaconda3\lib\site-packages\joblib\parallel.py", line 263, in <listcomp>
for func, args, kwargs in self.items]
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 581, in _fit_and_score
estimator = estimator.set_params(**cloned_parameters)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\pipeline.py", line 150, in set_params
self._set_params('steps', **kwargs)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\utils\metaestimators.py", line 54, in _set_params
super().set_params(**params)
File "C:\Users\fredd\Anaconda3\lib\site-packages\sklearn\base.py", line 233, in set_params
(key, self))
ValueError: Invalid parameter C for estimator Pipeline(steps=[('anova', SelectPercentile(percentile=1)),
('svc', SVC(probability=True))]). Check the list of available parameters with `estimator.get_params().keys()`.
It reads like the halvingsearch is trying to pass the pipeline as an input for C.
Upvotes: 0
Views: 495
Reputation: 5164
You want to perform a grid search over a Pipeline
object. When defining the parameters for the different steps of the pipeline, you have to use the <step>__<parameter>
syntax:
params = {
"svc__C": np.logspace(-3, 17, 21),
"svc__gamma": np.logspace(-20, 1, 21),
"svc__class_weight" : [None, 'balanced']
}
See the user guide for more information.
Upvotes: 1