Arun
Arun

Reputation: 2478

TensorFlow 2.0 learning rate scheduler with tf.GradientTape

I am using TensorFlow 2.0 and Python 3.8 and I want to use a learning rate scheduler for which I have a function. I have to train a neural network for 160 epochs with the following where the learning rate is to be decreased by a factor of 10 at 80 and 120 epochs, where the initial learning rate = 0.01.

def scheduler(epoch, current_learning_rate): 
        if epoch == 79 or epoch == 119: 
            return current_learning_rate / 10 
        else: 
            return min(current_learning_rate, 0.001) 

How can I use this learning rate scheduler function with 'tf.GradientTape()'? I know how to use this using "model.fit()" as a callback:

callback = tf.keras.callbacks.LearningRateScheduler(scheduler)

How do I use this while using custom training loops with "tf.GradientTape()"?

Thanks!

Upvotes: 4

Views: 5619

Answers (2)

AlexP
AlexP

Reputation: 437

A learning rate schedule needs a step value that can not be specified when using GradientTape followed by optimizer.apply_gradient().

So you should not pass directly the schedule as the learning_rate of the optimizer.

Instead, you can first call the schedule function to get the value for current step and then update the learning rate value in the optimizer:

optim = tf.keras.optimizers.SGD()
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(1e-2,1000,.9)
for step in range(0,1000): 
    lr = lr_schedule(step)
    optim.learning_rate = lr
    with GradientTape() as tape: 
        call func to differentiate 
    optim.apply_gradient(func,...)

Upvotes: 2

Iswariya Manivannan
Iswariya Manivannan

Reputation: 724

The learning rate for different epochs can be set using lr attribute of tensorflow keras optimizer. lr attribute of the optimizer still exists since tensorflow 2 has backward compatibility for keras (For more details refer the source code here). Below is a small snippet of how the learning rate can be varied across different epochs. self._train_step is similar to the train_step function defined here.

def set_learning_rate(epoch):
    if epoch > 180:
        optimizer.lr = 0.5e-6
    elif epoch > 160:
        optimizer.lr = 1e-6
    elif epoch > 120:
        optimizer.lr = 1e-5
    elif epoch > 3:
        optimizer.lr = 1e-4

def train(epochs, train_data, val_data):
    prev_val_loss = float('inf')
    for epoch in range(epochs):
        self.set_learning_rate(epoch)
        for images, labels in train_data:
            self._train_step(images, labels)
        for images, labels in val_data:
            self._test_step(images, labels)

Another alternative would be to use tf.keras.optimizers.schedules

learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
    [80*num_steps, 120*num_steps, 160*num_steps, 180*num_steps],
    [1e-3, 1e-4, 1e-5, 1e-6, 5e-6]
    )

optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate_fn)

Note that here one cant directly provide the epochs, instead the number of steps have to be given, where each step is len(train_data)/batch_size.

Upvotes: 4

Related Questions