reese0106
reese0106

Reputation: 2061

Google Cloud ML Engine "Skipping evaluation due to same checkpoint"

So I have an ML engine package based off of the census tutorial and I am trying to perform evaluation every N steps using the --min-eval-frequency flag, but I keep getting the message in stackdriver logs saying: "Skipping evaluation due to same checkpoint...". Basically, the evaluation will only happen 1x per epoch (because I guess the checkpoint eventually changes at that time). Are some additional changes needed to update the checkpoints more frequently? Any idea why this would evaluate more frequently?

Upvotes: 1

Views: 766

Answers (1)

Eli Bixby
Eli Bixby

Reputation: 1178

Checkpoints happen with a certain frequency. If a new checkpoint has not occurred by the time a new evaluation is scheduled to occur, you'll get the message "Skipping evaluation due to same checkpoint...". This is because evaluation needs to work off of frozen weights in a separate tf.Session to avoid having weights change during evaluation, and the only way to communicate these weights between sessions is with a checkpoint. So if you want to evaluate more often and you are getting that message, increase your checkpoint frequency. You can do this by adding a flag that populates tf.contrib.learn.RunConfig#save_checkpoints_steps.

Upvotes: 2

Related Questions