In Keras, If samples_per_epoch is less than the 'end' of the generator when it (loops back on itself) will this negatively affect result?

Question

I'm using Keras with Theano to train a basic logistic regression model.

Say I've got a training set of 1 million entries, it's too large for my system to use the standard model.fit() without blowing away memory.

I decide to use a python generator function and fit my model using model.fit_generator().
My generator function returns batch sized chunks of the 1M training examples (they come from a DB table, so I only pull enough records at a time to satisfy each batch request, keeping memory usage in check).
It's an endlessly looping generator, once it reaches the end of the 1 million, it loops and continues over the set

There is a mandatory argument in fit_generator() to specify samples_per_epoch. The documentation indicates

samples_per_epoch: integer, number of samples to process before going to the next epoch.

I'm assuming the fit_generator() doesn't reset the generator each time an epoch runs, hence the need for a infinitely running generator.

I typically set the samples_per_epoch to be the size of the training set the generator is looping over.

However, if samples_per_epoch this is smaller than the size of the training set the generator is working from and the nb_epoch > 1:

Will you get odd/adverse/unexpected training resulting as it seems the epochs will have differing sets training examples to fit to?
If so, do you 'fastforward' you generator somehow?

Answers (1)