Reputation: 31
I am a frequent user of scikit-learn, I want some insights about the “class_ weight ” parameter with SGD.
I was able to figure out till the function call
plain_sgd(coef, intercept, est.loss_function,
penalty_type, alpha, C, est.l1_ratio,
dataset, n_iter, int(est.fit_intercept),
int(est.verbose), int(est.shuffle), est.random_state,
pos_weight, neg_weight,
learning_rate_type, est.eta0,
est.power_t, est.t_, intercept_decay)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/stochastic_gradient.py
After this it goes to sgd_fast and I am not very good with cpython. Can you give some celerity on these questions.
Upvotes: 3
Views: 3144
Reputation: 40169
class_weight
can indeed help increasing the ROC AUC or f1-score of a classification model trained on imbalanced data.
You can try class_weight="auto"
to select weights that are inversely proportional to class frequencies. You can also try to pass your own weights has a python dictionary with class label as keys and weights as values.
Tuning the weights can be achieved via grid search with cross-validation.
Internally this is done by deriving sample_weight
from the class_weight
(depending on the class label of each sample). Sample weights are then used to scale the contribution of individual samples to the loss function used to trained the linear classification model with Stochastic Gradient Descent.
The feature penalization is controlled independently via the penalty
and alpha
hyperparameters. sample_weight
/ class_weight
have no impact on it.
Upvotes: 6