Does L1 or L2 regularization give the most sparse weights for the same loss function and optimizer?

If I consider a dataset, which regularization technique (L1 regularization or L2 regularization) will output the highest sparse weights for the same loss function and same optimizer?

Upvotes: 2

Answers (1)

desertnaut

Reputation: 60321

By definition, L1 regularization (lasso) forces some weights to zero, thus leading to sparser solutions; according to the Wikipedia entry on regularization:

It can be shown that the L1 norm induces sparsity

See also the L1 and L2 Regularization Methods post at Towards Data Science:

The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some feature altogether. So, this works well for feature selection in case we have a huge number of features.

For more details, see the following threads @ Cross Validated:

Sparsity in Lasso and advantage over ridge

Why does the Lasso provide Variable Selection?

Upvotes: 1

Does L1 or L2 regularization give the most sparse weights for the same loss function and optimizer?

Answers (1)

Related Questions