Why is 1-norm SVM more sparse than 2-norm SVM?

How are we increasing sparsity by using 1-norm weight in cost function as compared to using 2-norm weight in the same cost function for an SVM.

For 1-norm : Cost function- Minimize ||w||_1
For 2-norm : Cost function - Minimize ||w||_2

Is it related to LP-SVM?

Upvotes: 1

Answers (2)

Rob Neuhaus

Reputation: 9290

Look at the partial derivative of the l_1 loss with respect to some parameter.

The loss is constant with respect to an increase in weight. So that increased weight needs to offset some fixed amount of error, regardless of how small the weight already is.

Compare this the l2 loss, where the penalty scales with the size of the current parameter. So as it gets near 0, it only needs to have an infinitesimal decrease in error to offset the regularization penalty.

Upvotes: 2

Ran

Reputation: 4157

Note that ||w||_2 < ||w||_1 for the same w when 0 < w < 1 (which usually happens) since L2 norm squares the weights.

That's why ||w||_1 is a harder constraint which results in a sparse vector.

It's not specific to SVM, many algorithms use L1 or L2 regularizations.

Upvotes: 0

Why is 1-norm SVM more sparse than 2-norm SVM?

Answers (2)

Related Questions