Reputation: 356

How to undo the last training step in a Session in TensorFlow?

Is there a possibility to undo the last training step? For example when the loss value is 'NaN'.

...
for step in range(num_epoch):
   _, loss_value = sess.run([train_op, loss])
   if np.isnan(loss_value):
      # something like: sess.undo_last()
      break
...

If there is such a method. Does it also work for Multi GPU trainings?

Upvotes: 2

Answers (1)

Sorin

Reputation: 11968

There's no such thing, however you can do something like this. In your model add:

loss = tf.check_numerics(loss)

This will throw an InvalidArgument exception if your loss becomes NaN or Inf. Since this is computed before you compute any backpropagation no weights are modified. Your example code would look like:

for step in range(num_epoch):
   try:
     sess.run([train_op])
   except InvalidArgument:
     break

This will not help you though. Usually NaN or Inf loss means the model is already in a bad state. Try different activation functions or simpler models so that it doesn't go there.

Alternatively you can have checkpoints (save the model after every X steps) and look at picking a checkpoint before the error.

Upvotes: 1

How to undo the last training step in a Session in TensorFlow?

Answers (1)

Related Questions