machinery
machinery

Reputation: 6290

What batch size for neural network?

I have a training set consisting of 36 data points. I want to train a neural network on it. I can choose as the batch size for example 1 or 12 or 36 (every number where 36 can divided by).

Of course when I increase the batch size training runtime decreases substantially.

Is there a disadvantage if I choose e.g. 12 as the batch size instead of 1?

Upvotes: 2

Views: 2148

Answers (2)

Oleg Melnikov
Oleg Melnikov

Reputation: 3298

I agree with lejlot. The batchsize is not the problem in your current model building, given the very small data size. Once you move on to larger data that can't fit in memory, then try different batch sizes (say, some powers of 2, i.e. 32, 128, 512,...).

The choice of batch size depends on:

  1. your hardware capacity and model architecture. Given enough memory and the capacity of the bus carrying data from memory to CPU/GPU, the larger batch sizes result in faster learning. However, the debate is whether the quality remains.
  2. Algorithm and its implementation. For example, Keras python package (which is based on either Theano and TensorFlow implementation of neural network algorithms) states:

A batch generally approximates the distribution of the input data better than a single input. The larger the batch, the better the approximation; however, it is also true that the batch will take longer to process and will still result in only one update. For inference (evaluate/predict), it is recommended to pick a batch size that is as large as you can afford without going out of memory (since larger batches will usually result in faster evaluating/prediction).

You will have a better intuition after having tried different batch sizes. If your hardware and time allows, have the machine pick the right batch for you (loop through different batch sizes as part of the grid search.

Here are some good answers: one, two.

Upvotes: 1

lejlot
lejlot

Reputation: 66825

There are no golden rules for batch sizes. period.

However. Your dataset is extremely tiny, and probably batch size will not matter at all, all your problems will come from lack of data, not any hyperparameters.

Upvotes: 6

Related Questions