Reputation: 37
I am trying to solve a problem where I use KNN algorithm for classification. While using pipeline, I decided to add SelectKBest
but I get the error below :
All intermediate steps should be transformers and implement fit and transform.
I don't know if I can use this selection algorithm with KNN. But I tried with SVM as well and got the same result. Here is my code :
sel = SelectKBest('chi2',k = 3)
clf = kn()
s = ss()
step = [('scaler', s), ('kn', clf), ('sel',sel)]
pipeline = Pipeline(step)
parameter = {'kn__n_neighbors':range(1,40,1), 'kn__weights':['uniform','distance'], 'kn__p':[1,2] }
kfold = StratifiedKFold(n_splits=5, random_state=0)
grid = GridSearchCV(pipeline, param_grid = parameter, cv=kfold, scoring = 'accuracy', n_jobs = -1)
grid.fit(x_train, y_train)
Upvotes: 0
Views: 904
Reputation: 60370
The order of the operations in the pipeline, as determined in steps
, matters; from the docs:
steps : list
List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator.
The error is due to adding SelectKBest
as the last element of your pipeline:
step = [('scaler', s), ('kn', clf), ('sel',sel)]
which is not an estimator (it is a transformer), as well as to your intermediate step kn
not being a transformer.
I guess you don't really want to perform feature selection after you have fitted the model...
Change it to:
step = [('scaler', s), ('sel', sel), ('kn', clf)]
and you should be fine.
Upvotes: 1
Reputation: 37
So, I didn't think the order of the pipeline is important but then, I found out the last member of the pipeline has to be able to fit/transform. I changed the order of the pipeline by making clf the last. Problem is solved.
Upvotes: 0