mdornfe1
mdornfe1

Reputation: 2160

What's the purpose of nb_epoch in Keras's fit_generator?

It seems like I could get the exact same result by making num_samples bigger and keeping nb_epoch=1. I thought the purpose of multiple epochs was to iterate over the same data multiple times, but Keras doesn't reinstantiate the generator at the end of each epoch. It just keeps going. For example training this autoencoder:

import numpy as np
from keras.layers import (Convolution2D, MaxPooling2D, 
    UpSampling2D, Activation)
from keras.models import Sequential

rand_imgs = [np.random.rand(1, 100, 100, 3) for _ in range(1000)]

def keras_generator():
    i = 0
    while True:
        print(i)
        rand_img = rand_imgs[i]
        i += 1
        yield (rand_img, rand_img)


layers = ([
    Convolution2D(20, 5, 5, border_mode='same', 
        input_shape=(100, 100, 3), activation='relu'),

    MaxPooling2D((2, 2), border_mode='same'),

    Convolution2D(3, 5, 5, border_mode='same', activation='relu'),

    UpSampling2D((2, 2)),

    Convolution2D(3, 5, 5, border_mode='same', activation='relu')])

autoencoder = Sequential()
for layer in layers:
    autoencoder.add(layer)

gen = keras_generator()
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
history = autoencoder.fit_generator(gen, samples_per_epoch=100, nb_epoch=2)

It seems like I get the same result with (samples_per_epoch=100, nb_epoch=2) as I do for (samples_per_epoch=200, nb_epoch=1). Am I using fit_generator as intended?

Upvotes: 4

Views: 2542

Answers (1)

Marcin Możejko
Marcin Możejko

Reputation: 40506

Yes - you are right that when using keras.fit_generator these two approaches are equivalent. But - there are variety of reasons why keeping epochs is reasonable:

  1. Logging: in this case epoch comprises the amount of data after which you want to log some important statistics about training (like e.g. time or loss at the end of the epoch).
  2. Keeping directory structure when you are using generator to load data from your hard disk - in this case - when you know how many files you have in your directory - you may adjust the batch_size and nb_epoch to such values that epoch would comprise going through every example in your dataset.
  3. Keeping the structure of data when using flow generator - in this case, when you have e.g. a set of pictures loaded to your Python and you want to use Keras.ImageDataGenerator to apply different kind of data transformations, setting batch_size and nb_epoch in such way that epoch comprises going through every example in your dataset might help you in keeping track of a progress of your trainning process.

Upvotes: 4

Related Questions