Reputation: 953

Why the model size is in huge different between different optimizer?

With TensorFlow, my model size(model.ckpt.data) is 88M when optimizer is tf.train.GradientDescentOptimizer, but it turned to 220M when the optimizer changed to tf.train.AdamOptimizer.

Why is there so huge a difference?

Upvotes: 1

Answers (1)

Dr. Snoopy

Reputation: 56347

ADAM adds two running means (for gradient and square of gradient) as additional non-trainable parameters for each trainable parameter, meaning it increases the number of total parameters to three times. These non-trainable parameters are also saved as they are required to restart the learning process. That's why the model checkpoint is bigger.

Upvotes: 2

Why the model size is in huge different between different optimizer?

Answers (1)

Related Questions