user288609
user288609

Reputation: 13055

adam optimizer and momentum optimizer

I am trying to run an image segmentation code, which is based on U-net architecture. During the experimentation, I found that Adam optimizer runs much slowly than the momentum optimizer. I am not sure whether it is a common observation between these two optimizers? Or should it be a data-dependent observation?

Upvotes: 0

Views: 977

Answers (2)

nessuno
nessuno

Reputation: 27050

Optimization using Adam runs slowly than optimization using Momentum because the former needs to accumulate for every parameter the exponential moving average of first and second moments, since it's an adaptive learning rate algorithm. The latter, instead, don't need to keep track of the past gradients nor apply update rules with different values for every parameter.

Your observation is therefore correct but it's not data dependent, it's the optimization algorithm by itself that needs to do additional computation and therefore the execution time (for every train step) is slower.

The advantage is that using an adaptive learning rate algorithm, you reach a minimum faster, even if the single steps are slower.

Upvotes: 0

Jeremiah Johnson
Jeremiah Johnson

Reputation: 143

It might depend on your framework; for instance, this issue for MxNet: https://github.com/dmlc/mxnet/issues/1516. In my experience Adam tends to converge with less epochs, though I realize that's not the same as the optimizer running quickly.

Upvotes: 0

Related Questions