Madhav Thakker
Madhav Thakker

Reputation: 107

Why does different batch-sizes give different accuracy in Keras?

I was using Keras' CNN to classify MNIST dataset. I found that using different batch-sizes gave different accuracies. Why is it so?

Using Batch-size 1000 (Acc = 0.97600)

Using Batch-size 10 (Acc = 0.97599)

Although, the difference is very small, why is there even a difference? EDIT - I have found that the difference is only because of precision issues and they are in fact equal.

Upvotes: 3

Views: 8626

Answers (3)

pouyan
pouyan

Reputation: 3439

That is because of the Mini-batch gradient descent effect during training process. You can find good explanation Here that I mention some notes from that link here:

Batch size is a slider on the learning process.

  1. Small values give a learning process that converges quickly at the cost of noise in the training process.
  2. Large values give a learning process that converges slowly with accurate estimates of the error gradient.

and also one important note from that link is :

The presented results confirm that using small batch sizes achieves the best training stability and generalization performance, for a given computational cost, across a wide range of experiments. In all cases the best results have been obtained with batch sizes m = 32 or smaller

Which is the result of this paper.

EDIT

I should mention two more points Here:

  1. because of the inherent randomness in machine learning algorithms concept, generally you should not expect machine learning algorithms (like Deep learning algorithms) to have same results on different runs. You can find more details Here.
  2. On the other hand both of your results are too close and somehow they are equal. So in your case we can say that the batch size has no effect on your network results based on the reported results.

Upvotes: 6

pitfall
pitfall

Reputation: 2621

I want to add two points:

1) When use special treatments, it is possible to achieve similar performance for a very large batch size while speeding-up the training process tremendously. For example, Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour

2) Regarding your MNIST example, I really don't suggest you to over-read these numbers. Because the difference is so subtle that it could be caused by noise. I bet if you try models saved on a different epoch, you will see a different result.

Upvotes: 0

alexhg
alexhg

Reputation: 779

This is not connected to Keras. The batch size, together with the learning rate, are critical hyper-parameters for training neural networks with mini-batch stochastic gradient descent (SGD), which entirely affect the learning dynamics and thus the accuracy, the learning speed, etc.

In a nutshell, SGD optimizes the weights of a neural network by iteratively updating them towards the (negative) direction of the gradient of the loss. In mini-batch SGD, the gradient is estimated at each iteration on a subset of the training data. It is a noisy estimation, which helps regularize the model and therefore the size of the batch matters a lot. Besides, the learning rate determines how much the weights are updated at each iteration. Finally, although this may not be obvious, the learning rate and the batch size are related to each other. [paper]

Upvotes: 1

Related Questions