Reputation: 1942
Is it really necessary to optimize the initial learning rate when using ADAM as optimizer in tensorflow/keras? How can this be done (in tensorflow 2.x)?
Upvotes: 0
Views: 1545
Reputation: 2011
It is. Like with any hyperparameter, an optimal learning rate should be search for. It might be the case that your model will not learn if the learning rate is too big or too small even with an optimizer like ADAM which has a nice properties regarding decay etc.
Example of behavior of a model under ADAM optimizer with respect to a learning rate can be seen in this article How to pick the best learning rate for your machine learning project
Looking for right hyperparameters is called hyperparameter tuning. I am not using TF 2.* in my projects so I will give a reference to what TensorFlow itself offers Hyperparameter Tuning with the HParams Dashboard
Upvotes: 1