Aditya Das
Aditya Das

Reputation: 35

Issues with imbalanced dataset in case of binary classification

I have a binary classification problem where the data division is like :{0:85%,1:15%}. I have tried re-weighting class_weights and other sampling approches. But all the approaches that I have used is giving me unsatisfactory results. My dataset is (91125,57).

Accuracy:1
F1-Score:1
F2-Score:1
Precision:1
Recall:1
AUCROC:1
Kappa:1

Is there any other method I can use to handle such a situation?

Upvotes: 1

Views: 233

Answers (1)

maya-ami
maya-ami

Reputation: 410

Make sure you're dropping the target variable from your features before feeding the data to the classifier:

X = df.drop('target',axis=1) y = df['target']

I'd also check if some independent variables are highly correlated with the target. It may give your an idea what causes an unrealistically perfect classiification:

import seaborn as sns sns.heatmap(X_train.corr())

Upvotes: 1

Related Questions