Piyush jain
Piyush jain

Reputation: 17

Keras fit_generator is training one sample at a time, while i am yeilding more than one sample from generator

I am training a model using keras. i tried both 'fit' and 'fit_generator' functions. and i dont understand why there is a lot of difference in performance, may be i am doing something wrong. This is first time i have written batch_generator code.

given batch size of 10, What i observed is when using function-

fit: it trains faster (approx 3 min per epoch), count in verbose increases with multiples of batch size( here 10)
sample- 80/7632 [..............................] - ETA: 4:31 - loss: 2.2072 - acc: 0.4375

fit_generator: trains much slower (10 min per epoch), count in verbose increases 1 at a time (not equal to batch size)
Sample- 37/7632 [..............................] - ETA: 42:25 - loss: 2.1845 - acc: 0.3676

As you can see ETA is too high for fit_generator for same dataset. And fit_generator is increasing each time by 1, while fit is increasing in multiples of 10

Generator:

def batch_generator(X ,y, batch_size=10):
    from sklearn.utils import shuffle

    batch_count = int(len(X) / batch_size)
    extra = len(X) - (batch_count * batch_size)

    while 1:
        #shuffle X and y
        X_train, y_train = shuffle(X,y)

        #Yeild Batches
        for i in range(1, batch_count):
            batch_start = (i-1) * batch_size
            batch_end = i * batch_size
            X_batch = X_train[batch_start: batch_end]
            y_batch = y_train[batch_start: batch_end]
            yield X_batch, y_batch

        #Yeild Remaining Data less than batch size
        if(extra > 0):
            batch_start = batch_count * batch_size
            X_batch = X_train[batch_start: -1]
            y_batch = y_train[batch_start: -1]
            yield X_batch, y_batch

Fit Function:

model.fit_generator(batch_generator(X, y, 10),
                    verbose = 1,
                    samples_per_epoch = len(X),
                    epochs = 20,
                    validation_data = (X_test, y_test),
                    callbacks = callbacks_list)

can anyone explain why is this happening?

Upvotes: 0

Views: 791

Answers (1)

Dr. Snoopy
Dr. Snoopy

Reputation: 56357

fit_generator does not use samples, it uses steps, you are using the old Keras API with the samples_per_epoch parameter, this is incorrect and is producing wrong results. A correct fit_generator call would be:

model.fit_generator(batch_generator(X, y, 10),
                    verbose = 1,
                    steps_per_epoch = int(len(X) / batch_size),
                    epochs = 20,
                    validation_data = (X_test, y_test),
                    callbacks = callbacks_list)

steps_per_epoch controls how many steps (calls to the generator) to use before declaring the epoch over. It should be set to the number of total samples divided to the batch size. For fit_generator, the index in the progress bar will refer to steps (batches), not samples, so you cannot directly compare them to the indices in the progress bar of fit.

Upvotes: 1

Related Questions