Reputation: 515
If I want to train a model with train_generator, is there a significant difference between choosing
and
Currently I am training for 10 epochs, because each epoch takes a long time, but any graph showing improvement looks very "jumpy" because I only have 10 datapoints. I figure I can get a smoother graph if I use 100 Epochs, but I want to know first if there is any downside to this
Upvotes: 42
Views: 84981
Reputation: 59
steps_per_epoch
tells the network how many batches to include in an epoch.
By definition, an epoch
is considered complete when the dataset has been run through the model once in its entirety. With other words, it means that all training samples have been run through the model. (For further discussion, let us assume that the size of the training examples is 'm').
Also by definition, we know that `batch size' is between [1, m].
Below is what TensorFlow page says about steps_per_epoch
If you want to run training only on a specific number of batches from this Dataset, you can pass the steps_per_epoch argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.
Now suppose that your training_size, m = 128
and batch_size, b = 16
, which means that your data is grouped into 8 batches. According to the above quote, the maximum value you can assign to steps_per_epoch
is 8, as computed in one of the answers by @Ioannis Nasios.
However, it is not necessary that you set the value to 8 only (as in our example). You can choose any value between 1 and 8. You just need to be aware that the training will be performed only with this number of batches.
The reason for the jumpy error values could be the size of your batch, as correctly mentioned in this answer by @Chris Farr.
Training & evaluation from tf.data Datasets
If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).
The advantage of a low value for steps_per_epoch
is that different epochs are trained with different data sets (a kind of regularization). However, if you have a limited training size, using only a subset of stacks would not be what we want. It is a decision one has to make.
Upvotes: 5
Reputation: 37
The Steps per epoch denote the number of batches to be selected for one epoch. If 500 steps are selected then the network will train for 500 batches to complete one epoch. If we select the large number of epochs it can be computational
Upvotes: 1
Reputation: 8537
Steps per epoch does not connect to epochs.
Naturally what you want if to 1 epoch your generator pass through all of your training data one time. To achieve this you should provide steps per epoch equal to number of batches like this:
steps_per_epoch = int( np.ceil(x_train.shape[0] / batch_size) )
as from above equation the largest the batch_size
, the lower the steps_per_epoch
.
Next you will choose epoch based on chosen validation. (choose what you think best)
Upvotes: 6
Reputation: 3779
Based on what you said it sounds like you need a larger batch_size
, and of course there are implications with that which could impact the steps_per_epoch and number of epochs.
To solve for jumping-around
Implications of a larger batch-size
When to reduce epochs
When to adjust steps-per-epoch
Upvotes: 75