Reputation: 961
I have a training set containing 272 images.
Upvotes: 2
Views: 2143
Reputation: 388
The batch size defines the number of samples that propagates through the network before updating the model parameters.
Each batch of samples go through one full forward and backward propagation.
Example:
Total training samples (images) = 3000
batch_size = 32
epochs = 500
Then…
32 samples will be taken at a time to train the network.
To go through all 3000 samples it takes 3000/32 = 94 iterations 1 epoch.
This process continues 500 times (epochs).
You may be limited to small batch sizes based on your system hardware (RAM + GPU).
Smaller batches mean each step in gradient descent may be less accurate, so it may take longer for the algorithm to converge.
But, it has been observed that for larger batches there is a significant degradation in the quality of the model, as measured by its ability to generalize.
Batch size of 32 or 64 is a good starting point.
Summary:
Larger batch sizes result in faster progress in training, but don't always converge as fast.
Smaller batch sizes train slower but can converge faster
Upvotes: 0
Reputation: 567
If I provide the definition using the 272 images as the training dataset and 8 as batch size,
Now, if your dataset is very large, or if there are many possible ways to augment your images, which can again lead to a dataset of infinite or dynamic length, so how do you set the epoch in this case? You simply use steps per epoch to set a boundary. You pick an arbitrary value like say 100 and you assume your total dataset length to be 800. Now, it is another thing on how you do the augmentation. Normally, you can rotate, crop, or scale by random values each time.
Anyway, coming to the answers to your questions -
(Note: I am not actively working in this field any more. So some things may have changed or I may be mistaken.)
Upvotes: 3