harinsamaranayake
harinsamaranayake

Reputation: 961

Can someone explain the relationship between batch size and steps per epoch?

I have a training set containing 272 images.

  1. batch size = 8, steps per epoch = 1 > train the model for just 8 images and jumps to next epoch?
  2. batch size = 8, steps per epoch = 34 (no shuffle) > train the model for all 272 images and jumps to the next epoch?
  3. At the end of each steps per epoch does it update the weights of the model?
  4. If so, by increasing the number of steps per epoch does it gives a better result?
  5. Is there a convention in selecting batch size & steps per epoch?

Upvotes: 2

Views: 2143

Answers (2)

Mohamed TOUATI
Mohamed TOUATI

Reputation: 388

The batch size defines the number of samples that propagates through the network before updating the model parameters.

Each batch of samples go through one full forward and backward propagation.

Example:

Total training samples (images) = 3000
batch_size = 32
epochs = 500

Then…
32 samples will be taken at a time to train the network. 
To go through all 3000 samples it takes 3000/32 = 94 iterations  1 epoch. 
This process continues 500 times (epochs).


You may be limited to small batch sizes based on your system hardware (RAM + GPU).

Smaller batches mean each step in gradient descent may be less accurate, so it may take longer for the algorithm to converge.

But, it has been observed that for larger batches there is a significant degradation in the quality of the model, as measured by its ability to generalize. 

Batch size of 32 or 64 is a good starting point. 

Summary:
Larger batch sizes result in faster progress in training, but don't always converge as fast. 
Smaller batch sizes train slower but can converge faster

Upvotes: 0

Nafiz Ahmed
Nafiz Ahmed

Reputation: 567

If I provide the definition using the 272 images as the training dataset and 8 as batch size,

  • batch size - the number of images that will be feed together to the neural network.
  • epoch - an iteration over all the dataset images
  • steps - usually the batch size and number of epochs determine the steps. By default, here, steps = 272/8 = 34 per epoch. In total, if you want 10 epochs, you get 10 x 34 = 340 steps.

Now, if your dataset is very large, or if there are many possible ways to augment your images, which can again lead to a dataset of infinite or dynamic length, so how do you set the epoch in this case? You simply use steps per epoch to set a boundary. You pick an arbitrary value like say 100 and you assume your total dataset length to be 800. Now, it is another thing on how you do the augmentation. Normally, you can rotate, crop, or scale by random values each time.

Anyway, coming to the answers to your questions -

  1. Yes
  2. Yes
  3. Yes if you are using Mini-batch gradient descent
  4. Well, yes unless it overfits or your data is very small or ... there are a lot of other things to consider.
  5. I am not aware of any. But for a ballpark figure, you can check on the training mechanism of high accuracy open source trained models in your domain.

(Note: I am not actively working in this field any more. So some things may have changed or I may be mistaken.)

Upvotes: 3

Related Questions