Reputation: 1129
I've been using tensorflow with tf.train.Supervisor
-
sv = tf.train.Supervisor(logdir=path, save_model_secs=900)
with sv.managed_session() as sess:
if not sv.should_stop():
#Rest of the code
Recently, it crashed during training and since then it has been throwing the below error at the with sv.managed_session()
line above -
DataLossError (see above for traceback): Checksum does not match: stored 1057608875 vs. calculated on the restored bytes 763056116
[[Node: save/RestoreV2_31 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_31/tensor_names, save/RestoreV2_31/shape_and_slices)]]
Is it possible to fix it?
Upvotes: 4
Views: 8033
Reputation: 5206
This means your checkpoint file got corrupted. Delete the latest version (i.e. the one with the largest global_step
number) and try again and it should work.
Upvotes: 5