If I want to train a model with train_generator, is there a significant difference between choosing 10 Epochs with 500 Steps each and 100 Epochs with 50 Steps each Currently I am training for 10 epochs, because each epoch takes a long time, but any graph showing improvement looks very "jumpy" because I only have 10 datapoints. I figure I can get a smoother graph if I use 100 Epochs, but I want to know first if there is any downside to this

Based on what you said it sounds like you need a larger batch_size , and of course there are implications with that which could impact the steps_per_epoch and number of epochs. To solve for jumping-around A larger batch size will give you a better gradient and will help to prevent jumping around You may also want to consider a smaller learning rate, or a learning rate scheduler (or decay) to allow the network to "settle in" as it trains Implications of a larger batch-size Too large of a batch_size can produce memory problems, especially if you are using a GPU. Once you exceed the limit, dial it back until it works. This will help you find the max batch-size that your system can work with. Too large of a batch size can get you stuck in a local minima, so if your training get stuck, I would reduce it some. Imagine here you are over-correcting the jumping-around and it's not jumping around enough to further minimize the loss function. When to reduce epochs If your train error is very low, yet your test/validation is very high, then you have over-fit the model with too many epochs. The best way to find the right balance is to use early-stopping with a validation test set. Here you can specify when to stop training, and save the weights for the network that gives you the best validation loss. (I highly recommend using this always) When to adjust steps-per-epoch Traditionally, the steps per epoch is calculated as train_length // batch_size, since this will use all of the data points, one batch size worth at a time. If you are augmenting the data, then you can stretch this a tad (sometimes I multiply that function above by 2 or 3 etc. But, if it's already training for too long, then I would just stick with the traditional approach.

Choosing number of Steps per Epoch

Answers (4)

coldMan

Reputation: 59

steps_per_epoch tells the network how many batches to include in an epoch.

By definition, an epoch is considered complete when the dataset has been run through the model once in its entirety. With other words, it means that all training samples have been run through the model. (For further discussion, let us assume that the size of the training examples is 'm').

Also by definition, we know that `batch size' is between [1, m].

Below is what TensorFlow page says about steps_per_epoch

If you want to run training only on a specific number of batches from this Dataset, you can pass the steps_per_epoch argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch.

Now suppose that your training_size, m = 128 and batch_size, b = 16, which means that your data is grouped into 8 batches. According to the above quote, the maximum value you can assign to steps_per_epoch is 8, as computed in one of the answers by @Ioannis Nasios.

However, it is not necessary that you set the value to 8 only (as in our example). You can choose any value between 1 and 8. You just need to be aware that the training will be performed only with this number of batches.

The reason for the jumpy error values could be the size of your batch, as correctly mentioned in this answer by @Chris Farr.

Training & evaluation from tf.data Datasets

If you do this, the dataset is not reset at the end of each epoch, instead we just keep drawing the next batches. The dataset will eventually run out of data (unless it is an infinitely-looping dataset).

The advantage of a low value for steps_per_epoch is that different epochs are trained with different data sets (a kind of regularization). However, if you have a limited training size, using only a subset of stacks would not be what we want. It is a decision one has to make.

Upvotes: 5

Manish Vasandnani

Reputation: 37

The Steps per epoch denote the number of batches to be selected for one epoch. If 500 steps are selected then the network will train for 500 batches to complete one epoch. If we select the large number of epochs it can be computational

Upvotes: 1

Ioannis Nasios

Reputation: 8537

Steps per epoch does not connect to epochs.

Naturally what you want if to 1 epoch your generator pass through all of your training data one time. To achieve this you should provide steps per epoch equal to number of batches like this:

steps_per_epoch = int( np.ceil(x_train.shape[0] / batch_size) )

as from above equation the largest the batch_size, the lower the steps_per_epoch.

Next you will choose epoch based on chosen validation. (choose what you think best)

Upvotes: 6

Chris Farr

Reputation: 3779

Based on what you said it sounds like you need a larger batch_size, and of course there are implications with that which could impact the steps_per_epoch and number of epochs.

To solve for jumping-around

A larger batch size will give you a better gradient and will help to prevent jumping around
You may also want to consider a smaller learning rate, or a learning rate scheduler (or decay) to allow the network to "settle in" as it trains

Implications of a larger batch-size

Too large of a batch_size can produce memory problems, especially if you are using a GPU. Once you exceed the limit, dial it back until it works. This will help you find the max batch-size that your system can work with.
Too large of a batch size can get you stuck in a local minima, so if your training get stuck, I would reduce it some. Imagine here you are over-correcting the jumping-around and it's not jumping around enough to further minimize the loss function.

When to reduce epochs

If your train error is very low, yet your test/validation is very high, then you have over-fit the model with too many epochs.
The best way to find the right balance is to use early-stopping with a validation test set. Here you can specify when to stop training, and save the weights for the network that gives you the best validation loss. (I highly recommend using this always)

When to adjust steps-per-epoch

Traditionally, the steps per epoch is calculated as train_length // batch_size, since this will use all of the data points, one batch size worth at a time.
If you are augmenting the data, then you can stretch this a tad (sometimes I multiply that function above by 2 or 3 etc. But, if it's already training for too long, then I would just stick with the traditional approach.

Upvotes: 75

Choosing number of Steps per Epoch

Answers (4)

Related Questions