Reputation: 303

How to understand sample_weight in sklearn.metrics?

Do we need to set sample_weight when we evaluate our model? Now I have trained a model about classification, but the dataset is unbalanced. When I set the sample_weight with compute_sample_weight('balanced'), the scores are very nice. Precision:0.88, Recall:0.86 for '1' class. But the scores will be bad if I don't set the sample_weight. Precision:0.85, Recall:0.21. Will the sample_weight destroy the original data distribution?

Upvotes: 7

Answers (2)

xappppp

Reputation: 491

Here is my understanding: The sample_weight have nothing to do with balanced or unbalanced on itself, it is just a way to reflect the distribution of the sample data. So basically the following two way of expression is equivalent, and expression 1 is definitely more efficient in terms of space complexity. This 'sample_weight' is just the same as any other statistical package in any language and have nothing about the random sampling

expression 1

X = [[1,1],[2,2]]
y = [0,1]
sample_weight = [1000,2000]  # total 3000

versus

expression 2

X = [[1,1],[2,2],[2,2],...,[1,1],[2,2],[2,2]] # total 300 rows
y = [0,1,1,...,0,1,1]
sample_weight = [1,1,1,...,1,1,1]  # or just set as None

Upvotes: 0

Florian Mutel

Reputation: 1084

The sample-weight parameter is only used during training.

Suppose you have a dataset with 16 points belonging to class "0" and 4 points belonging to class "1".

Without this parameter, during optimization, they have a weight of 1 for loss calculation: they contribute equally to the loss that the model is minimizing. That means that 80% of the loss is due to points of class "0" and 20% is due to points of class "1".

By setting it to "balanced", scikit-learn will automatically calculate weights to assign to class "0" and class "1" such that 50% of the loss comes from class "0" and 50% from class "1".

This paramete affects the "optimal threshold" you need to use to separate class "0" predictions from class "1", and also influences the performance of your model.

Upvotes: 0

How to understand sample_weight in sklearn.metrics?

Answers (2)

Related Questions