Reputation: 1247
I am trying to implement SMOTE of imblearn inside the Pipeline. My data sets are text data stored in pandas dataframe. Please see below the code snippet
text_clf =Pipeline([('vect', TfidfVectorizer()),('scale', StandardScaler(with_mean=False)),('smt', SMOTE(random_state=5)),('clf', LinearSVC(class_weight='balanced'))])
After this I am using GridsearchCV.
grid = GridSearchCV(text_clf, parameters, cv=4, n_jobs=-1, scoring = 'accuracy')
Where parameters are nothing but tuning parameters mostly for TfidfVectorizer(). I am getting the following error.
All intermediate steps should be transformers and implement fit and transform. 'SMOTE
Post this error, I have changed the code to as follows.
vect = TfidfVectorizer(use_idf=True,smooth_idf = True, max_df = 0.25, sublinear_tf = True, ngram_range=(1,2))
X = vect.fit_transform(X).todense()
Y = vect.fit_transform(Y).todense()
X_Train,X_Test,Y_Train,y_test = train_test_split(X,Y, random_state=0, test_size=0.33, shuffle=True)
text_clf =make_pipeline([('smt', SMOTE(random_state=5)),('scale', StandardScaler(with_mean=False)),('clf', LinearSVC(class_weight='balanced'))])
grid = GridSearchCV(text_clf, parameters, cv=4, n_jobs=-1, scoring = 'accuracy')
Where parameters
are nothing but tuning C
in SVC
classifiers.
This time I am getting the following error:
Last step of Pipeline should implement fit.SMOTE(....) doesn't
What is going here? Can anyone please help?
Upvotes: 6
Views: 6559
Reputation: 541
imblearn.SMOTE
has no transform
method. Docs is here.
But all steps except the last in a pipeline should have it, along with fit
.
To use SMOTE with sklearn pipeline you should implement a custom transformer calling SMOTE.fit_sample()
in transform
method.
Another easier option is just to use ibmlearn pipeline:
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as imbPipeline
# This doesn't work with sklearn.pipeline.Pipeline because
# SMOTE doesn't have a .tranform() method.
# (It has .fit_sample() or .sample().)
pipe = imbPipeline([
...
('oversample', SMOTE(random_state=5)),
('clf', LinearSVC(class_weight='balanced'))
])
Upvotes: 4