Reputation: 71
I am taking a course on Deep Learning in Python and I am stuck on the following lines of an example:
regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)
From the definitions I know, 1 epoch = going through all training examples once to do one weight update.
batch_size
is used in optimizer that divide the training examples into mini batches. Each mini batch is of size batch_size
.
I am not familiar with adam optimization, but I believe it is a variation of the GD or Mini batch GD. Gradient Descent - has one big batch (all the data), but multiple epochs. Mini Batch Gradient Descent - uses multiple mini batches, but only 1 epoch.
Then, how come the code has both multiple mini batches and multiple epochs? Does epoch in this code has a different meaning then the definition above?
Upvotes: 2
Views: 7247
Reputation: 2723
Your understanding of epoch and batch_size seems correct.
Little more precision below.
An epoch corresponds to one whole training dataset sweep. This sweep can be performed in several ways.
N_examples
examples in the training dataset, N_examples/batch_size
optimisation iterations correspond to one epoch.In your case (epochs=100
, batch_size=32
), the regressor
would sweep the whole dataset 100 items, with mini data batches of size 32 (ie. Mini-batch mode).
If I assume your dataset size is N_examples
, the regressor
would perform N_examples/32
model weight optimisation iteration per epoch.
So for 100 epochs: 100*N_examples/32
model weight optimisation iterations.
All in all, having epoch>1
and having batch_size>1
are compatible.
Upvotes: 2
Reputation: 11450
Although the other answer basically already gives you the correct result, I would like to clarify on a few points you made in your post, and correct it.
The (commonly accepted) definitions of the different terms are as follows.
One another note, you are correct about ADAM. It is generally seen as a more powerful variant of vanilla gradient descent, since it uses more sophisticated heuristics (first order derivatives) to speed up and stabilize convergence.
Upvotes: 3
Reputation: 51
assume you have 3200 examples to train your model. Then 1 epoch = going through 3200 training examples, but do 100 times back propagation if you set batch_size=32.
Upvotes: 1