Reputation: 2825
I am trying to understand the purpose of ReduceLROnPlateau()
function in keras.
I understood that this function helps to reduce the learning rate when there is no improvement in the validation loss. But will this not make the network not to get out of a local minimum? What if the network stays at a local minimum for about 5 epochs and this function further reduces the learning rate while increasing the learning rate would actually help the network get out of such a local minimum?
In other words, how will it understand if it has reached a local minimum or a plateau?
Upvotes: 4
Views: 318
Reputation: 53758
First up, here is a good explanation from CS231n class why learning rate decay is reasonable in general:
In training deep networks, it is usually helpful to anneal the learning rate over time. Good intuition to have in mind is that with a high learning rate, the system contains too much kinetic energy and the parameter vector bounces around chaotically, unable to settle down into deeper, but narrower parts of the loss function. Knowing when to decay the learning rate can be tricky: Decay it slowly and you’ll be wasting computation bouncing around chaotically with little improvement for a long time. But decay it too aggressively and the system will cool too quickly, unable to reach the best position it can.
Concerning your question, unfortunately, you can't know it. If the optimizer hits a deep valley and can't get out of it, it simply hopes that this valley is good and worth exploring with smaller learning rate. Currently, there's no technique to tell whether there are better valleys, i.e., if it's a local or global minimum. So the optimizer makes a bet to explore the current one, rather than jump far away and start over. As it turns out in practice, no local minimum is much worse than others, that's why this strategy often works.
Also note that the loss surface may appear like a plateau for some learning rate, but not for 10 times smaller learning rate. So "escape the plateau" and "escape local minimum" are different challenges, and ReduceLROnPlateau
aims for the first one.
Upvotes: 4