Reputation: 2478
I have to use learning rate warmup where you start training a VGG-19 CNN for CIFAR-10 with warmup from a learning rate of 0.00001 to 0.1 over the first 10000 iterations (or, approximately 13 epochs) using learning rate warmup. And then for the remainder of training, you use the learning rate of 0.01 where a learning rate decay is used to reduce the learning rate by a factor of 10 at 80 and 120 epochs. The model has to be trained for a total of 144 epochs.
I am using Python 3 and TensorFlow2 where training dataset has 50000 examples and batch size = 64. The number of training iterations in one epoch = 50000/64 = 781 iterations (approx.). How can I use both of the learning rate warmup and learning rate decay together in the code?
Currently, I am using a learning rate decay by:
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
boundaries, values)
print("\nCurrent step value: {0}, LR: {1:.6f}\n".format(optimizer.iterations.numpy(), optimizer.learning_rate(optimizer.iterations)))
However, I don't know how to use a learning rate warmup along with learning rate decay.
Help?
Upvotes: 5
Views: 11720
Reputation: 944
start = 0
warmup = 5000
multiplier = 10.0
boundaries = [start, warmup, 100000, 110000]
values = [0.1, 0.5, 0.1, 0.05, 0.01]
learning_rate_fn = keras.optimizers.schedules.PiecewiseConstantDecay(
boundaries, values) * multiplier
print("\nCurrent step value: {0}, LR: {1:.6f}\n".format(optimizer.iterations.numpy(), optimizer.learning_rate(optimizer.iterations)))
Upvotes: 0
Reputation: 548
what about using the implementation from the Transformers library?
from typing import Callable
import tensorflow as tf
class WarmUp(tf.keras.optimizers.schedules.LearningRateSchedule):
def __init__(
self,
initial_learning_rate: float,
decay_schedule_fn: Callable,
warmup_steps: int,
power: float = 1.0,
name: str = None,
):
super().__init__()
self.initial_learning_rate = initial_learning_rate
self.warmup_steps = warmup_steps
self.power = power
self.decay_schedule_fn = decay_schedule_fn
self.name = name
def __call__(self, step):
with tf.name_scope(self.name or "WarmUp") as name:
# Implements polynomial warmup. i.e., if global_step < warmup_steps, the
# learning rate will be `global_step/num_warmup_steps * init_lr`.
global_step_float = tf.cast(step, tf.float32)
warmup_steps_float = tf.cast(self.warmup_steps, tf.float32)
warmup_percent_done = global_step_float / warmup_steps_float
warmup_learning_rate = self.initial_learning_rate * tf.math.pow(warmup_percent_done, self.power)
return tf.cond(
global_step_float < warmup_steps_float,
lambda: warmup_learning_rate,
lambda: self.decay_schedule_fn(step - self.warmup_steps),
name=name,
)
def get_config(self):
return {
"initial_learning_rate": self.initial_learning_rate,
"decay_schedule_fn": self.decay_schedule_fn,
"warmup_steps": self.warmup_steps,
"power": self.power,
"name": self.name,
}
Upvotes: 10
Reputation: 1875
You can pass the learning rate scheduler to any optimizer by setting it to the lr
parameter. For example -
from tensorlow.keras.optimizers import schedules, RMSProp
boundaries = [100000, 110000]
values = [1.0, 0.5, 0.1]
lr_schedule = schedules.PiecewiseConstantDecay(boundaries, values)
optimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)
Upvotes: 1