Alex Ryu
Alex Ryu

Reputation: 61

Trying to use imblearn.pipeline with RandomOverSampler and DecisionTreeClassifier

I am trying to set hyperparameters of DecisionTreeClassifiers using GridSearchCV, and because my data is unbalanced, i am trying to use imblearn.over_sampling.RandomOverSampler.

from imblearn.over_sampling import RandomOverSampler

dtpass = tree.DecisionTreeClassifier()
pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])

parameters = {'class__max_depth': range(3,7), 
          'class__ccp_alpha': np.arange(0, 0.001, 0.00025), 
          'class__min_samples_leaf' : [50]
         }

dt2 = GridSearchCV(estimator = pipe1, 
               param_grid = parameters,
               n_jobs = 4,
              scoring = 'roc_auc'
)

dt2.fit(x, y)

This returns an error:

AttributeError: 'RandomOverSampler' object has no attribute '_validate_data'

What am I doing wrong here?

EDIT: Solution posted below

Upvotes: 1

Views: 2218

Answers (2)

Kaustubh Lohani
Kaustubh Lohani

Reputation: 655

Try this:

from imblearn.over_sampling import RandomOverSampler
from sklearn.tree import DecisionTreeClassifier
from imblearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
import numpy as np

dtpass = DecisionTreeClassifier()
sampling=RandomOverSampler()


pipe1=make_pipeline(sampling,dtpass)
# pipe1 = Pipeline([('sampling', RandomOverSampler()), ('class', dtpass)])

parameters = {'class__max_depth': range(3,7), 
          'class__ccp_alpha': np.arange(0, 0.001, 0.00025), 
          'class__min_samples_leaf' : [50]
         }

dt2 = GridSearchCV(estimator = pipe1, 
               param_grid = parameters,
               n_jobs = 4,
              scoring = 'roc_auc'
)

dt2.fit(x, y)

Upvotes: 1

Alex Ryu
Alex Ryu

Reputation: 61

Link to the solution page that took a lot of googling:

https://makerspace.aisingapore.org/community/ai4i-5-supervised-learning/encountered-attributeerror-when-run-train_test_splitpreprocessed_data-output_var-after-randomoversampler/

The solution was to

 pip install -U imbalanced-learn

instead of

 conda install -c conda-forge imbalanced-learn

Upvotes: 0

Related Questions