HuggingFace's linear scheduler with warmup parameters

Question

HuggingFace's get_linear_schedule_with_warmup takes as arguments:

num_warmup_steps (int) — The number of steps for the warmup phase.
num_training_steps (int) — The total number of training steps.

And in the guide on a full training process, with a similar scheduler, they state:

To properly define [the scheduler], we need to know the number of training steps we will take, which is the number of epochs we want to run multiplied by the number of training batches (which is the length of our training dataloader).

I want to follow an implementation from a research paper in which they apply linear learning rate warm-up during the first 10% of the updates followed by a linear decay.

I was a bit confused in the wording in "first 10% of the updates", would this correspond to 10% over the entirety of training? Am I right in assuming that, since num_training_steps is based off of the number of epochs multiplied by the number of batches, then num_warmup_steps = number of batches * number of epochs * 0.1?

HuggingFace's linear scheduler with warmup parameters

Answers (1)

Related Questions

HuggingFace&#39;s linear scheduler with warmup parameters

Answers (1)

Related Questions

HuggingFace's linear scheduler with warmup parameters