kilgoretrout
kilgoretrout

Reputation: 168

Keras: Correct use of fit_generator, predict_generator, and evaluate_generator

I'm encountering weird behaviour when using fit_generator, predict_generator, and evaluate_generator, and I would like to ask the following questions, for which I could find no answer in the documentation:

  1. Is it ok to have batches of different sizes when using fit_generator?

My batches are defined time-wise: they group the events that took place in the same hour. Each batch can therefore group a different number of events. For clarity, this is how my generators look like (following the logic in this thread):

def grouper(g,x,y):
     while True:
        for gr in g.unique():
            # this assigns indices to the entire set of values in g,
            # the subsects to all the rows in which g == gr
            indices = g == gr
            yield (x[indices],y[indices])

all_data_generator = grouper(df['batch_id'], X, Y)
train_generator = grouper(df.loc[df['set'] == 'train', 'batch_id'], X_train, Y_train)
validation_generator = grouper(df.loc[df['set'] == 'val', 'batch_id'], X_val, Y_val)
test_generator = grouper(df.loc[df['set'] == 'test', 'batch_id'], X_test, Y_test)
  1. Is it ok to have different number of batches in train_generator and validation_generator?

For clarity, I pass those two (different) numbers explicitly to fit_generator in the call:

train_batches = df.loc[df['set'] == 'train', 'batch_id'].nunique()
val_batches = df.loc[df['set'] == 'val', 'batch_id'].nunique()

history = fmodel.fit_generator(train_generator, 
                             steps_per_epoch=train_batches, 
                             validation_data=validation_generator,
                             validation_steps=val_batches,
                             epochs=20, verbose = 0)
  1. Predictions are wildly different depending on whether I use predict_classes or predict_generator, which baffles me.

Here's the code:

df['pred'] = fmodel.predict_classes(X)

# returns different results from
total_batches = df['batch_id'].nunique()
df['pred_gen'] = fmodel.predict_generator(all_data_generator, steps = total_batches)
  1. Similarly, evaluate and evaluate_generator return different results.

The code:

scores = model.evaluate(X_test, Y_test, verbose = 0)

# returns different results from
scores_generator = fmodel.evaluate_generator(test_generator, steps=test_batches)

I know there are already many issues referring to my points 3. and 4. (e.g., 3477, 6499) but the main takeaways there seem to refer to

So I'm wondering whether points 1. and 2. might be the culprits here.

Upvotes: 3

Views: 3915

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86600

1 and 2

Yes, totally ok.
It's even expected that 2 be true.

3

Predict classes is not documented. What does it do exactly? I think it predicts indices while all other prediction methods predict the actual model's outputs, right?

4

This is sensible...

Are you pretty sure your generator is outputting exactly what you want?

You may try to see a few batches to compare them with x and y:

for i in range(aFewBatches):
    print(next(train_generator))
    #or create some comparisons

Even if the generators are correct, you are definitely reshuffling (actually sorting) your data, when you select your batches for the generator.

While evaluate will take the entire x, y data as it is, usually in batches of 32, evaluate_generator will take your selected batches. So, the metrics per batch will certainly vary, and the final result which is a mean of the batch metrics will also be different. So, unless the difference too big, it's ok.

PS: I'm not entirely sure whether evaluate will give you mean batch metrics or entire data metrics, but evaluate_generator will bring mean batch metrics, which is enough for a difference.

Upvotes: 3

Related Questions