apgsov
apgsov

Reputation: 904

HuggingFace's linear scheduler with warmup parameters

HuggingFace's get_linear_schedule_with_warmup takes as arguments:

  1. num_warmup_steps (int) — The number of steps for the warmup phase.
  2. num_training_steps (int) — The total number of training steps.

And in the guide on a full training process, with a similar scheduler, they state:

To properly define [the scheduler], we need to know the number of training steps we will take, which is the number of epochs we want to run multiplied by the number of training batches (which is the length of our training dataloader).

I want to follow an implementation from a research paper in which they apply linear learning rate warm-up during the first 10% of the updates followed by a linear decay.

I was a bit confused in the wording in "first 10% of the updates", would this correspond to 10% over the entirety of training? Am I right in assuming that, since num_training_steps is based off of the number of epochs multiplied by the number of batches, then num_warmup_steps = number of batches * number of epochs * 0.1?

Upvotes: 3

Views: 4326

Answers (1)

cronoik
cronoik

Reputation: 19495

I also see it the same way as you, the first 10% of the updates refer to the total number of training steps.

Commonly, a formula like this is used to get the number of total training steps:

t_total = len(train_dataloader) // num_of_epochs * gradient_accumulation_steps

Upvotes: 1

Related Questions