Issues with imbalanced dataset in case of binary classification

Question

I have a binary classification problem where the data division is like :{0:85%,1:15%}. I have tried re-weighting class_weights and other sampling approches. But all the approaches that I have used is giving me unsatisfactory results. My dataset is (91125,57).

Accuracy:1
F1-Score:1
F2-Score:1
Precision:1
Recall:1
AUCROC:1
Kappa:1

Is there any other method I can use to handle such a situation?

maya-ami · Accepted Answer

Make sure you're dropping the target variable from your features before feeding the data to the classifier:

X = df.drop('target',axis=1) y = df['target']

I'd also check if some independent variables are highly correlated with the target. It may give your an idea what causes an unrealistically perfect classiification:

import seaborn as sns sns.heatmap(X_train.corr())

Issues with imbalanced dataset in case of binary classification

Answers (1)

Related Questions