Reputation: 1653
I'm training a model on a server that allows me only one hour of computation: At the end of that time, it will simply kill my job. I would like tensorflow to save the results of its training after, say, 58 minutes of training, no matter what is the current state. I'm OK with it saving the status at the last completed epoch, I just want to have an idea what's going on. How can I do that?
Upvotes: 3
Views: 711
Reputation: 3037
Of course, you can define a callback function delegated to stop the training phase.
You can have a look here for further information:
https://towardsdatascience.com/neural-network-with-tensorflow-how-to-stop-training-using-callback-5c8d575c18a9
In this example, is created a callback function in order to stop the training phase when the 'ACCURACY' exceeds the threshold. You can modify the function in order to make a time computation in order to verify the elapsed time.
This is a working piece of code:
class TimeOut(Callback):
def __init__(self, t0, timeout):
super().__init__()
self.t0 = t0
self.timeout = timeout # time in minutes
def on_train_batch_end(self, batch, logs=None):
if time.time() - self.t0 > self.timeout * 60: # 58 minutes
print(f"\nReached {(time.time() - self.t0) / 60:.3f} minutes of training, stopping")
self.model.stop_training = True
callbacks = [TimeOut(t0=time.time(), timeout=58)]
Upvotes: 4
Reputation: 91
Tensorflow has recently made an addon that does exactly this.
In your case it would look something like this
import tensorflow_addons as tfa
time_stopping_callback = tfa.callbacks.TimeStopping(seconds=60*58, verbose=1) #58min
model.fit(........, callbacks = [time_stopping_callback])
Link: https://www.tensorflow.org/addons/tutorials/time_stopping
Upvotes: 0