Is it really necessary feeding the neural network in chunks that are all multiples of the batch size?

In a online tutorial the batch size for feeding the neural networks was set as m = 32; I read that the train set and the test set were cutted in order to form chunks that are multiples of 32. Like this:

X_train, X_val = X_train[:462 * batch_size], X_train[-56 * batch_size:]

y_train, y_val = y_train[:462 * batch_size], y_train[-56 * batch_size:]

Is this really necessary or is just an extra precaution?

Upvotes: 1

Answers (1)

Salman.S

Reputation: 96

Choosing a mini-batch size of powers of 2 helps because of the way computer memory works in your GPU and in general. Your mini-batches are going to be vectorized and parallelly processed in GPU, choosing a non-binary (power of 2) mini-batch size may result in inefficient hence poorer performance. When you are dealing with a large volume of data, small inefficiencies can have a large impact on performance.

You could just use the maximum batch size available on your GPU but that may not give you the best results as batch size impacts learning significantly. With smaller batch sizes the estimate of a gradient in each epoch is noisier but it helps the algorithm to avoid local minima. But it also makes training less efficient if you go too low as the weights will jump around too much and the cost will converge much more slowly.

Upvotes: 1

Is it really necessary feeding the neural network in chunks that are all multiples of the batch size?

Answers (1)

Related Questions