Reputation: 154
When I call model.fit_generator() to my model, it shows training progress as you would expect in the output. However, it finishes up one short of the max, then moves onto the validation. The validation shows the same progress bar as the training, even though validation steps are completely different (~70k training steps/8k val steps). The validation progress bar stops when it hits the 8k steps e.g:
75999/76000 [===========================>..] - ETA: 0s - loss: 0.4556 - acc: 0.840Epoch 1/500
8200/76000 [====>........................] - ETA: 0s - loss: 0.9822 - acc: 0.7564
The first line is the training, second is validation.
When I change the steps manually so that there are less training steps than val steps, I get the following output:
19/20 [===========================>..] - ETA: 0s - loss: 0.4558 - acc: 0.8980Epoch 1/500
19/20 [===========================>..] - ETA: 0s - loss: 0.8200 - acc: 0.7730
It pauses on this output whilst the rest of the validation steps occur. The output for the rest of the validation is not shown in the progress bar.
This error occurs when val_steps and train_steps are generated from my generator or when i set them manually as above, so the issue is not with my generator - I think. Here is my call to fit_generator() (it is the same when i use .fit())
model.fit_generator(
train_generator,
steps_per_epoch=train_steps,
epochs=epochs,
validation_data=val_generator,
validation_steps=val_steps,
verbose=1,
callbacks=[weight_saving_callback,early_stopping],
max_queue_size=40,
workers=1,
use_multiprocessing=False,
#train_class_weight=None, #because we are not using target classes
#val_class_weight=None, #because we are not using target classes
validation_freq=1)
Can anyone see where this bug is? I dont think its affecting the training process, simply the output - but I can't figure out where the problem is. Using TF 2.1 and Keras 2.3.1.
Simply put: why does the validation progress bar not show the correct number of validation steps?
Upvotes: 1
Views: 361
Reputation: 8092
My experience is that if you are trying to print out your own information at the end of an epoch it messes up the tensorflow printout. What I ended up doing is to create class variables for the items I want to print out and pass them to an on_epoch_begin function and print the information out there. That does not seem to mess up the printout from tensorflow at the end of an epoch.
Upvotes: 1