Ray
Ray

Reputation: 41428

In Keras, If samples_per_epoch is less than the 'end' of the generator when it (loops back on itself) will this negatively affect result?

I'm using Keras with Theano to train a basic logistic regression model.

Say I've got a training set of 1 million entries, it's too large for my system to use the standard model.fit() without blowing away memory.

There is a mandatory argument in fit_generator() to specify samples_per_epoch. The documentation indicates

samples_per_epoch: integer, number of samples to process before going to the next epoch.

I'm assuming the fit_generator() doesn't reset the generator each time an epoch runs, hence the need for a infinitely running generator.

I typically set the samples_per_epoch to be the size of the training set the generator is looping over.

However, if samples_per_epoch this is smaller than the size of the training set the generator is working from and the nb_epoch > 1:

Upvotes: 4

Views: 2051

Answers (1)

Austin
Austin

Reputation: 51

I'm dealing some something similar right now. I want to make my epochs shorter so I can record more information about the loss or adjust my learning rate more often.

Without diving into the code, I think the fact that .fit_generator works with the randomly augmented/shuffled data produced by the keras builtin ImageDataGenerator supports your suspicion that it doesn't reset the generator per epoch. So I believe you should be fine, as long as the model is exposed to your whole training set it shouldn't matter if some of it is trained in a separate epoch.

If you're still worried you could try writing a generator that randomly samples your training set.

Upvotes: 3

Related Questions