Momentum 0.9 and 0.99 in SGD

Question

I have a SGD solver:

base_lr: 1e-2    
lr_policy: "step"
gamma: 0.1       
stepsize: 10000  
max_iter: 300000  
momentum: 0.9

As suggestion in the Caffe's documentation, they said that "if you increase μ, it may be a good idea to decrease α accordingly (and vice versa)". Hence, if I choose momentum is 0.99, then I believe that the base_lr must be 1e-4

base_lr: 1e-4    
lr_policy: "step"
gamma: 0.1       
stepsize: 10000  
max_iter: 300000  
momentum: 0.99

Am I right? Do I need to increase the stepsize too? What is benefit using a bigger momentum (i.e 0.99), compared to a smaller momentum (i.e 0.9)?

Prune · Accepted Answer

Thanks for the clarification. No, this is not a direct correlation. The amount of change you need is something you determine by experimentation for your data set and max_iter (which also needs tuning). You might find that the best lr for momentum 0.99 is 1e-3, 1e-5, or something else. You might find that 0.99 is too heavy for best results, and you need to back off to 0.92 or 0.97

Without proper details on the situation, I can't guess at what will work for you better than the guess ranges I just gave. My work has focused more on tuning the other hyper-parameters; momentum = 0.90 served us well for all of our applications.

Momentum 0.9 and 0.99 in SGD

Answers (1)

Related Questions