Samuel
Samuel

Reputation: 6247

Why limit the weight size can prevent overfitting in machine learning

The most popular way to prevent over-fitting is weight decay(L2, L1) in machine learning(Like logistic regression, Neural network, linear regression etc). The purpose of weight decay is preventing the weight get big.
My question is why small weight can prevent over-fitting.
what if I do weight normalization after each iteration.

Upvotes: 3

Views: 4039

Answers (3)

Oussama
Oussama

Reputation: 3244

A small example using logistic regression that explains the concept:

in logistic regression, the probability of y knowing x is

Pr(y/x) = 1/(1+exp(-y*w.x))

for y =1, we will have Pr(1/x) = 1 if w = +infinity, and Pr(1/x) = 0 if w -infinity

You overfit your data if your probabities regarding your training set is 1 or 0 (or very close)

Therefore, adding a regularization prevents your weights for going to infinity.

Upvotes: 0

rahulm
rahulm

Reputation: 734

Imagine the parabola ax^2 + bx + c. The larger the coefficient, a, is, the skinnier the parabola and the more closely it fits the data points. Overfitting happens when the curve fit to the data, fits to the data points too closely (using large coefficients). Therefore, making the coefficients smaller and generally sparse can prevent overfitting.

Upvotes: 3

user3784553
user3784553

Reputation:

A large subset machine learning techniques have mathematical models that require large coefficients/weights to correctly represent sudden changes, incoherence, or other high-dimensionality phenomena shown in individual data points in the training data. By limiting the coefficients, one essentially limit the expressiveness of a model to "smooth" or low dimensional results, which (depending on the specific problem you are trying to solve) might fit real world data better under most metrics. In this sense it can be considered as a smoothness prior which we heuristically established by observing real world data and subsequently incorporated into the training process of the mathematical model as a regularization term.

Upvotes: 2

Related Questions