Gianluca Micchi
Gianluca Micchi

Reputation: 1653

Save a tensorflow model after a fixed training time

I'm training a model on a server that allows me only one hour of computation: At the end of that time, it will simply kill my job. I would like tensorflow to save the results of its training after, say, 58 minutes of training, no matter what is the current state. I'm OK with it saving the status at the last completed epoch, I just want to have an idea what's going on. How can I do that?

Upvotes: 3

Views: 711

Answers (2)

alessiosavi
alessiosavi

Reputation: 3037

Of course, you can define a callback function delegated to stop the training phase.

You can have a look here for further information:
https://towardsdatascience.com/neural-network-with-tensorflow-how-to-stop-training-using-callback-5c8d575c18a9

In this example, is created a callback function in order to stop the training phase when the 'ACCURACY' exceeds the threshold. You can modify the function in order to make a time computation in order to verify the elapsed time.


This is a working piece of code:

class TimeOut(Callback):
    def __init__(self, t0, timeout):
        super().__init__()
        self.t0 = t0
        self.timeout = timeout  # time in minutes

    def on_train_batch_end(self, batch, logs=None):
        if time.time() - self.t0 > self.timeout * 60:  # 58 minutes
            print(f"\nReached {(time.time() - self.t0) / 60:.3f} minutes of training, stopping")
            self.model.stop_training = True

callbacks = [TimeOut(t0=time.time(), timeout=58)]

Upvotes: 4

Marc225
Marc225

Reputation: 91

Tensorflow has recently made an addon that does exactly this.

In your case it would look something like this

import tensorflow_addons as tfa

time_stopping_callback = tfa.callbacks.TimeStopping(seconds=60*58, verbose=1) #58min

model.fit(........, callbacks = [time_stopping_callback])

Link: https://www.tensorflow.org/addons/tutorials/time_stopping

Upvotes: 0

Related Questions