Reputation: 13055
I am trying to run an image segmentation code, which is based on U-net architecture. During the experimentation, I found that Adam optimizer runs much slowly than the momentum optimizer. I am not sure whether it is a common observation between these two optimizers? Or should it be a data-dependent observation?
Upvotes: 0
Views: 977
Reputation: 27050
Optimization using Adam runs slowly than optimization using Momentum because the former needs to accumulate for every parameter the exponential moving average of first and second moments, since it's an adaptive learning rate algorithm. The latter, instead, don't need to keep track of the past gradients nor apply update rules with different values for every parameter.
Your observation is therefore correct but it's not data dependent, it's the optimization algorithm by itself that needs to do additional computation and therefore the execution time (for every train step) is slower.
The advantage is that using an adaptive learning rate algorithm, you reach a minimum faster, even if the single steps are slower.
Upvotes: 0
Reputation: 143
It might depend on your framework; for instance, this issue for MxNet: https://github.com/dmlc/mxnet/issues/1516. In my experience Adam tends to converge with less epochs, though I realize that's not the same as the optimizer running quickly.
Upvotes: 0