Ink
Ink

Reputation: 953

Why the model size is in huge different between different optimizer?

With TensorFlow, my model size(model.ckpt.data) is 88M when optimizer is tf.train.GradientDescentOptimizer, but it turned to 220M when the optimizer changed to tf.train.AdamOptimizer.

Why is there so huge a difference?

Upvotes: 1

Views: 271

Answers (1)

Dr. Snoopy
Dr. Snoopy

Reputation: 56347

ADAM adds two running means (for gradient and square of gradient) as additional non-trainable parameters for each trainable parameter, meaning it increases the number of total parameters to three times. These non-trainable parameters are also saved as they are required to restart the learning process. That's why the model checkpoint is bigger.

Upvotes: 2

Related Questions