Missing one batch in the training for loop?

Question

The data has n_rows rows
The batch size is batch_size

I see some code uses:

n_batches = int(n_rows / batch_size)

What if n_rows is not a multiple of batch size? Is the n_batches still correct?

abcdaire · Accepted Answer

In fact you can see that in several code, and we know that labeled data is extremely valuable so you don't want to loose some precious labeled examples. At first glance it looks like a bug, and it seems that we are loosing some training examples , but we have to get a closer look at the code.

When you see that, in general, as in the code that you sent, at each epoch (based on the fact that one epoch is seeing n_batches = int(n_rows / batch_size) examples), the data is shuffled after each epoch. Therefore through time (after several epochs) you'll see all your training examples. We're not loosing any examples \o/

Small conclusion: If you see that, ensure that the data is shuffled at each epoch, otherwise your network might never see some training examples.

What are the advantages of doing that ?

It's efficient: By using this mechanism you ensure that at each training step your network will see batch_size examples, and you won't perform a training loop with a small number of training examples.

It's more rigorous: Imagine you have one example left and you don't shuffle. At each epoch , assuming your loss is the average loss of the batch, for this last example it will be equivalent to have a batch that consist of one element repeated batch_size time, it will be like weighting this example to have more importance. If you shuffle this effect will be reduced (since the remaining example will change through time), but it's more rigorous to have a constant batch size during your training epoch.

There are also some advantages of shuffling your data during training see: statexchange post

I'll also add to the post, that if you are using mechanism such as Batch Normalization, it's better to have a constant batch size during training, for example if n_rows % batch_size = 1 , passing a single example as batch during training can create some troubles.

Note: I speak about a constant batch size during a training epoch and not over the whole training cycle (multiple epochs) , because even if it's normally the case (to be constant during the whole training process), you can find some research work that modify the size of the batches during training e.g. Don't Decay the Learning Rate, Increase the Batch Size.

Missing one batch in the training for loop?

Answers (1)

What are the advantages of doing that ?

Related Questions