Mario L
Mario L

Reputation: 577

Class weights vs under/oversampling

In imbalanced classification (with scikit-learn) what would be the difference of balancing classes (i.e. set class_weight to balanced) to oversampling with SMOTE for example? What would be the expected effects of one vs the other?

Upvotes: 11

Views: 6745

Answers (1)

Constanza Garcia
Constanza Garcia

Reputation: 366

Class weights directly modify the loss function by giving more (or less) penalty to the classes with more (or less) weight. In effect, one is basically sacrificing some ability to predict the lower weight class (the majority class for unbalanced datasets) by purposely biasing the model to favor more accurate predictions of the higher weighted class (the minority class).

Oversampling and undersampling methods essentially give more weight to particular classes as well (duplicating observations duplicates the penalty for those particular observations, giving them more influence in the model fit), but due to data splitting that typically takes place in training this will yield slightly different results as well.

Please refer to https://datascience.stackexchange.com/questions/52627/why-class-weight-is-outperforming-oversampling

Upvotes: 10

Related Questions