Reputation: 603
I am learning the backpropagation algorithm used to train neural networks. It kind of makes sense, but there is still one part I don't get.
As far as I understand, the error derivative is calculated with respect to all weights in the network. This results in an error gradient whose number of dimensions is the number of weights in the net. Then, the weights are changed by the negative of this gradient, multiplied by the learning rate.
This seems about right, but why is the gradient not normalized? What is the rationale behind the length of the delta vector being proportional to the length of the gradient vector?
Upvotes: 1
Views: 341
Reputation: 5151
You can't normalize gradient. Actually in backpropogation you have gradient descent method of error. Instead you normalize and scale your input. And then it will give you proportional movement on the error surface and proportional movement on the error surface will give you faster approach to local or sometimes global minima. Here you can see explanation of what normalization does
Upvotes: 2