user915783
user915783

Reputation: 699

optimal batch size for image classification using deep learning

I have a broad question, but should be still relevant. lets say I am doing a 2 class image classification using a CNN. a batch size of 32-64 should be sufficient for training purpose. However, if I had data with about 13 classes, surely 32 batch size would not be sufficient for a good model, as each batch might get 2-3 images of each class. is there a generic or approximate formula to determine the batch size for training? or should that be determined as a hyperparameter using techniques like grid search or bayesian methods?

sedy

Upvotes: 3

Views: 7418

Answers (1)

Nopileos
Nopileos

Reputation: 2117

Batch size is a hyper parameter like e.g. learning rate. It is really hard to say what is the perfect size for your problem. The problem you are mentioning might exist but is only really relevant in specific problems where you can't just to random sampling like face/person re-identification.

For "normal" problems random sampling is sufficient. The reason behind minibatch training is, to get a more stable training. You want your weight updates to go in the right direction in regards to the global minimum of the loss function for the whole dataset. A minibatch is an approximation of this.

With increasing the batchsize you get less updates but "better" updates. With a small batchsize you get more updates, but they will more often go in the wrong direction. If the batch size is to small (e.g. 1) the network might take a long time to converge and thus increases the training time. To large of a batch size can hurt the generalization of the network. Good paper about the topic On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Another interesting paper on the topic is: Don't Decay the Learning Rate, Increase the Batch Size. Which analyzes the effect of batch size on the training. In general learning rate and batch size have effects on each other.

In general batch size is more a factor to reduce training time, because you can make use of parallelism and have less weight updates with increasing batch size and more stability. As with everything look at what others did for a task comparable with your problem and take it as a baseline and experiment with it a little. Also with huge networks the available memory often limits the maximum batch size anyway.

Upvotes: 4

Related Questions