Reputation: 347
How are we increasing sparsity by using 1-norm weight in cost function as compared to using 2-norm weight in the same cost function for an SVM.
For 1-norm : Cost function- Minimize ||w||_1
For 2-norm : Cost function - Minimize ||w||_2
Is it related to LP-SVM?
Upvotes: 1
Views: 1042
Reputation: 9290
Look at the partial derivative of the l_1 loss with respect to some parameter.
The loss is constant with respect to an increase in weight. So that increased weight needs to offset some fixed amount of error, regardless of how small the weight already is.
Compare this the l2 loss, where the penalty scales with the size of the current parameter. So as it gets near 0, it only needs to have an infinitesimal decrease in error to offset the regularization penalty.
Upvotes: 2
Reputation: 4157
Note that ||w||_2 < ||w||_1 for the same w when 0 < w < 1 (which usually happens) since L2 norm squares the weights.
That's why ||w||_1 is a harder constraint which results in a sparse vector.
It's not specific to SVM, many algorithms use L1 or L2 regularizations.
Upvotes: 0