Reputation: 415
I ask tensorflow to save models every 100 iterations in every epoch, the following is my code. But after 900 iterations, only trained models for the 500th, 600th, 700th, 800th, 900th iterations were saved.
with tf.Session(config = tf.ConfigProto(log_device_placement = True)) as sess:
sess.run(init_op)
for i in range(args.num_epochs):
start_time = time.time()
k = 0
acc_train = 0
# initialize the iterator to train_dataset
sess.run(train_init_op)
while True:
try:
accu, l, _ = sess.run([accuracy, loss, optimizer], feed_dict = {training: True})
k += 1
acc_train += accu
if k % 100 == 0:
print('Epoch: {}, step: {}, training loss: {:.3f}, training accuracy: {:.2f}%'.format(i, k, l, accu * 100))
saver.save(sess, args.saved_model_path, global_step = (i+1) * k)
except tf.errors.OutOfRangeError:
break
The following is the training accuracies:
Epoch: 0, step: 100, training loss: 0.669, training accuracy: 59.38%
Epoch: 0, step: 200, training loss: 0.806, training accuracy: 54.69%
Epoch: 0, step: 300, training loss: 0.781, training accuracy: 57.81%
Epoch: 0, step: 400, training loss: 0.725, training accuracy: 64.06%
Epoch: 0, step: 500, training loss: 0.347, training accuracy: 89.06%
Epoch: 0, step: 600, training loss: 0.193, training accuracy: 89.06%
Epoch: 0, step: 700, training loss: 0.003, training accuracy: 100.00%
Epoch: 0, step: 800, training loss: 0.190, training accuracy: 98.44%
Epoch: 0, step: 900, training loss: 0.009, training accuracy: 100.00%
My question is why tensorflow did not saved models for the 100th, 200th, 300th, 400th iterations? Thank you!
Upvotes: 0
Views: 46
Reputation: 4183
It did, but I'm guessing the Saver
instance you created had the default max_keep
value of 5, so it overwrote them as the last 5 were created. To keep 10, change your saver creation line to
saver = tf.train.Saver(max_keep=10)
You might also want to play with the keep_checkpoint_every_n_hours
argument if you don't want to save -every- one.
Upvotes: 2