Reputation: 3070
I have a SGD solver:
base_lr: 1e-2
lr_policy: "step"
gamma: 0.1
stepsize: 10000
max_iter: 300000
momentum: 0.9
As suggestion in the Caffe's documentation, they said that "if you increase μ, it may be a good idea to decrease α accordingly (and vice versa)". Hence, if I choose momentum is 0.99
, then I believe that the base_lr
must be 1e-4
base_lr: 1e-4
lr_policy: "step"
gamma: 0.1
stepsize: 10000
max_iter: 300000
momentum: 0.99
Am I right? Do I need to increase the stepsize
too? What is benefit using a bigger momentum (i.e 0.99
), compared to a smaller momentum (i.e 0.9
)?
Upvotes: 1
Views: 2906
Reputation: 77847
Thanks for the clarification. No, this is not a direct correlation. The amount of change you need is something you determine by experimentation for your data set and max_iter (which also needs tuning). You might find that the best lr
for momentum 0.99 is 1e-3, 1e-5, or something else. You might find that 0.99 is too heavy for best results, and you need to back off to 0.92 or 0.97
Without proper details on the situation, I can't guess at what will work for you better than the guess ranges I just gave. My work has focused more on tuning the other hyper-parameters; momentum = 0.90 served us well for all of our applications.
Upvotes: 3