Masmm
Masmm

Reputation: 37

Problem with SelectKBest method in pipeline

I am trying to solve a problem where I use KNN algorithm for classification. While using pipeline, I decided to add SelectKBest but I get the error below :

All intermediate steps should be transformers and implement fit and transform.

I don't know if I can use this selection algorithm with KNN. But I tried with SVM as well and got the same result. Here is my code :

sel = SelectKBest('chi2',k = 3)
clf = kn()
s = ss()
step = [('scaler', s), ('kn', clf), ('sel',sel)]
pipeline = Pipeline(step)
parameter = {'kn__n_neighbors':range(1,40,1), 'kn__weights':['uniform','distance'], 'kn__p':[1,2] }
kfold = StratifiedKFold(n_splits=5, random_state=0)
grid = GridSearchCV(pipeline, param_grid = parameter, cv=kfold, scoring = 'accuracy', n_jobs = -1)
grid.fit(x_train, y_train)

Upvotes: 0

Views: 904

Answers (2)

desertnaut
desertnaut

Reputation: 60370

The order of the operations in the pipeline, as determined in steps, matters; from the docs:

steps : list

List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.

The error is due to adding SelectKBest as the last element of your pipeline:

step = [('scaler', s), ('kn', clf), ('sel',sel)]

which is not an estimator (it is a transformer), as well as to your intermediate step kn not being a transformer.

I guess you don't really want to perform feature selection after you have fitted the model...

Change it to:

step = [('scaler', s), ('sel', sel), ('kn', clf)]

and you should be fine.

Upvotes: 1

Masmm
Masmm

Reputation: 37

So, I didn't think the order of the pipeline is important but then, I found out the last member of the pipeline has to be able to fit/transform. I changed the order of the pipeline by making clf the last. Problem is solved.

Upvotes: 0

Related Questions