nit254
nit254

Reputation: 31

What is class_weight parameter does in scikit-learn SGD

I am a frequent user of scikit-learn, I want some insights about the “class_ weight ” parameter with SGD.

I was able to figure out till the function call

plain_sgd(coef, intercept, est.loss_function,
                 penalty_type, alpha, C, est.l1_ratio,
                 dataset, n_iter, int(est.fit_intercept),
                 int(est.verbose), int(est.shuffle), est.random_state,
                 pos_weight, neg_weight,
                 learning_rate_type, est.eta0,
                 est.power_t, est.t_, intercept_decay)

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/stochastic_gradient.py

After this it goes to sgd_fast and I am not very good with cpython. Can you give some celerity on these questions.

  1. I am having a class biased in the dev set where positive class is somewhere 15k and negative class is 36k. does the class_weight will resolve this problem. Or doing undersampling will be a better idea. I am getting better numbers but it’s hard to explain.
  2. If yes then how it actually does it. I mean is it applied on the features penalization or is it a weight to the optimization function. How I can explain this to layman ?

Upvotes: 3

Views: 3144

Answers (1)

ogrisel
ogrisel

Reputation: 40169

class_weight can indeed help increasing the ROC AUC or f1-score of a classification model trained on imbalanced data.

You can try class_weight="auto" to select weights that are inversely proportional to class frequencies. You can also try to pass your own weights has a python dictionary with class label as keys and weights as values.

Tuning the weights can be achieved via grid search with cross-validation.

Internally this is done by deriving sample_weight from the class_weight (depending on the class label of each sample). Sample weights are then used to scale the contribution of individual samples to the loss function used to trained the linear classification model with Stochastic Gradient Descent.

The feature penalization is controlled independently via the penalty and alpha hyperparameters. sample_weight / class_weight have no impact on it.

Upvotes: 6

Related Questions