adam optimizer and momentum optimizer

Question

I am trying to run an image segmentation code, which is based on U-net architecture. During the experimentation, I found that Adam optimizer runs much slowly than the momentum optimizer. I am not sure whether it is a common observation between these two optimizers? Or should it be a data-dependent observation?

nessuno · Accepted Answer

Optimization using Adam runs slowly than optimization using Momentum because the former needs to accumulate for every parameter the exponential moving average of first and second moments, since it's an adaptive learning rate algorithm. The latter, instead, don't need to keep track of the past gradients nor apply update rules with different values for every parameter.

Your observation is therefore correct but it's not data dependent, it's the optimization algorithm by itself that needs to do additional computation and therefore the execution time (for every train step) is slower.

The advantage is that using an adaptive learning rate algorithm, you reach a minimum faster, even if the single steps are slower.

adam optimizer and momentum optimizer

Answers (2)

Related Questions