Jon
Jon

Reputation: 67

Python model.fit_generator gets stuck on first epoch and tries to compute "unknown" number of steps

I am very new to python and image classification models but I have some tensorflow code which has worked fine until recently. When I get to this part of the code I run into a problem all of a sudden. I am running through google colab notebooks.

epochs = 5

history = model.fit_generator(train_generator, 
                    epochs=epochs,                 
                    validation_data=val_generator)

The fit_generator is unable to compute a number of steps per epoch and lists it as unknown. The first epoch then just continues without stopping with the accuracy slowly ticking up towards 1 if I leave it long enough.

Epoch 1/5
    325/Unknown - 992s 3s/step - loss: 0.2221 - accuracy: 0.9318

Does anyone have ideas what would cause it to have an Unknown number of steps per epoch and never get past epoch 1?

Here are is some more information from the code that may be relevant (training size is 1602 and test 395 with 11 different classes):

Found 1602 images belonging to 11 classes.
Found 395 images belonging to 11 classes.

batch size was set to 64

for image_batch, label_batch in train_generator:
  break
image_batch.shape, label_batch.shape
((64, 224, 224, 3), (64, 11))
IMG_SHAPE = (IMAGE_SIZE, IMAGE_SIZE, 3)

# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                              include_top=False, 
                                              weights='imagenet')
base_model.trainable = False
model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(11, activation='softmax')
])

model summary

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
mobilenetv2_1.00_224 (Model) (None, 7, 7, 1280)        2257984   
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 5, 5, 32)          368672    
_________________________________________________________________
dropout_2 (Dropout)          (None, 5, 5, 32)          0         
_________________________________________________________________
global_average_pooling2d_2 ( (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 11)                363       
=================================================================
Total params: 2,627,019
Trainable params: 369,035
Non-trainable params: 2,257,984

Upvotes: 0

Views: 1636

Answers (2)

Anmol Deep
Anmol Deep

Reputation: 681

It could be an issue related to multiprocessing. You can also try setting workers=1 and use_multiprocessing=False. It worked for me.

Upvotes: 0

Bashir Kazimi
Bashir Kazimi

Reputation: 1377

you should pass steps_per_epoch and validation_steps parameters to your fit_generator function to let the model know how many batches there are for training and validation sets.

The values for those parameters are usually number of examples divided by the batch size. In this case:

steps_per_epoch = 1602//64
validation_steps = 395//64

Then:

model.fit_generator(train_generator, 
                epochs=epochs,                 
                validation_data=val_generator,steps_per_epoch=1602//64, validation_steps=395//64)

Upvotes: 1

Related Questions