Reputation: 279
If I consider a dataset, which regularization technique (L1 regularization or L2 regularization) will output the highest sparse weights for the same loss function and same optimizer?
Upvotes: 2
Views: 1343
Reputation: 60321
By definition, L1 regularization (lasso) forces some weights to zero, thus leading to sparser solutions; according to the Wikipedia entry on regularization:
It can be shown that the L1 norm induces sparsity
See also the L1 and L2 Regularization Methods post at Towards Data Science:
The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some feature altogether. So, this works well for feature selection in case we have a huge number of features.
For more details, see the following threads @ Cross Validated:
Sparsity in Lasso and advantage over ridge
Why does the Lasso provide Variable Selection?
Upvotes: 1