Hugging Face Transformers trainer: per_device_train_batch_size vs auto_find_batch_size

Question

A Hugging Face Transformers Trainer can receive a per_device_train_batch_size argument, or an auto_find_batch_size argument.

However, they seem to have different effects. One thing to consider is that per_device_train_batch_size defaults to 8: it is always set, and you can't disable it.

I have also observed that if I run into OOM errors, lowering per_device_train_batch_size can solve the issue, but auto_find_batch_size doesn't solve the problem. This is quite counter-intuitive, since it should find a batch size that is small enough (I can do it manually).

So: what does auto_find_batch_size do, exactly?

KurtMica · Accepted Answer

The auto_find_batch_size argument is an optional argument which can be used in addition to the per_device_train_batch_size argument.

As you point out, lowering the batch size is one way to resolve out-of-memory errors. The auto_find_batch_size argument automates the lowering process. Enabling this, will use find_executable_batch_size from accelerate, which:

operates with exponential decay, decreasing the batch size in half after each failed run

The per_device_train_batch_size is used as the initial batch size to start off with. So if you use the default of 8, it starts training with a batch size of 8 (on a single device), & if it fails, it will restart the training procedure with a batch size of 4.

Hugging Face Transformers trainer: per_device_train_batch_size vs auto_find_batch_size

Answers (1)

Related Questions