Arun
Arun

Reputation: 2478

Using learning rate schedule and learning rate warmup with TensorFlow2

I have to use learning rate warmup where you start training a VGG-19 CNN for CIFAR-10 with warmup from a learning rate of 0.00001 to 0.1 over the first 10000 iterations (or, approximately 13 epochs) using learning rate warmup. And then for the remainder of training, you use the learning rate of 0.01 where a learning rate decay is used to reduce the learning rate by a factor of 10 at 80 and 120 epochs. The model has to be trained for a total of 144 epochs.

I am using Python 3 and TensorFlow2 where training dataset has 50000 examples and batch size = 64. The number of training iterations in one epoch = 50000/64 = 781 iterations (approx.). How can I use both of the learning rate warmup and learning rate decay together in the code?

Currently, I am using a learning rate decay by:

boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]

learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values)
print("\nCurrent step value: {0}, LR: {1:.6f}\n".format(optimizer.iterations.numpy(), optimizer.learning_rate(optimizer.iterations)))

However, I don't know how to use a learning rate warmup along with learning rate decay.

Help?

Upvotes: 5

Views: 11720

Answers (3)

Nwoye CID
Nwoye CID

Reputation: 944

start = 0
warmup = 5000
multiplier = 10.0
boundaries = [start, warmup, 100000, 110000]
values = [0.1, 0.5, 0.1, 0.05, 0.01]

learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values) * multiplier
print("\nCurrent step value: {0}, LR: {1:.6f}\n".format(optimizer.iterations.numpy(), optimizer.learning_rate(optimizer.iterations)))

Upvotes: 0

Pauli
Pauli

Reputation: 548

what about using the implementation from the Transformers library?

from typing import Callable

import tensorflow as tf


class WarmUp(tf.keras.optimizers.schedules.LearningRateSchedule):

def __init__(
    self,
    initial_learning_rate: float,
    decay_schedule_fn: Callable,
    warmup_steps: int,
    power: float = 1.0,
    name: str = None,
):
    super().__init__()
    self.initial_learning_rate = initial_learning_rate
    self.warmup_steps = warmup_steps
    self.power = power
    self.decay_schedule_fn = decay_schedule_fn
    self.name = name

def __call__(self, step):
    with tf.name_scope(self.name or "WarmUp") as name:
        # Implements polynomial warmup. i.e., if global_step < warmup_steps, the
        # learning rate will be `global_step/num_warmup_steps * init_lr`.
        global_step_float = tf.cast(step, tf.float32)
        warmup_steps_float = tf.cast(self.warmup_steps, tf.float32)
        warmup_percent_done = global_step_float / warmup_steps_float
        warmup_learning_rate = self.initial_learning_rate * tf.math.pow(warmup_percent_done, self.power)
        return tf.cond(
            global_step_float < warmup_steps_float,
            lambda: warmup_learning_rate,
            lambda: self.decay_schedule_fn(step - self.warmup_steps),
            name=name,
        )

def get_config(self):
    return {
        "initial_learning_rate": self.initial_learning_rate,
        "decay_schedule_fn": self.decay_schedule_fn,
        "warmup_steps": self.warmup_steps,
        "power": self.power,
        "name": self.name,
    }

Upvotes: 10

Aditya Mishra
Aditya Mishra

Reputation: 1875

You can pass the learning rate scheduler to any optimizer by setting it to the lr parameter. For example -

from tensorlow.keras.optimizers import schedules, RMSProp
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]

lr_schedule = schedules.PiecewiseConstantDecay(boundaries, values)
optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)

Upvotes: 1

Related Questions