bitemybyte
bitemybyte

Reputation: 1089

fit_generator trains with 0 accuracy

I'm trying to make a model from scratch using TensorFlow, Keras and ImageDataGenerator, but it does not go as expected. I use the generator just for loading the images, so no data augmentation is used. There are two folders with train and test data, each folder has 36 subfolders filled with images. I get the following output:

Using TensorFlow backend.
Found 13268 images belonging to 36 classes.
Found 3345 images belonging to 36 classes.
Epoch 1/2
1/3 [=========>....................] - ETA: 0s - loss: 15.2706 - acc: 0.0000e+00
3/3 [==============================] - 1s 180ms/step - loss: 14.7610 - acc: 0.0667 - val_loss: 15.6144 - val_acc: 0.0312
Epoch 2/2
1/3 [=========>....................] - ETA: 0s - loss: 14.5063 - acc: 0.1000
3/3 [==============================] - 0s 32ms/step - loss: 15.5808 - acc: 0.0333 - val_loss: 15.6144 - val_acc: 0.0312

Even though it seems OK, apparently it does not train at all. I've tried using different amount of epochs, steps and larger datasets - almost nothing changes. It takes around half a second to train each epoch even with over 60k images! The weird thing is that when I tried saving images to respective folders, it saves only around 500-600 of them and most likely stops.

from tensorflow.python.keras.applications import ResNet50
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Flatten, GlobalAveragePooling2D, Conv2D, Dropout
from tensorflow.python.keras.applications.resnet50 import preprocess_input
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
import keras
import os

if __name__ == '__main__':
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

    image_size = 28
    img_rows = 28
    img_cols = 28
    num_classes = 36

    data_generator = ImageDataGenerator()

    train_generator = data_generator.flow_from_directory(
        directory="/final train 1 of 5/",
        save_to_dir="/image generator output/train/",
        target_size=(image_size, image_size),
        color_mode="grayscale",
        batch_size=10,
        class_mode='categorical')

    validation_generator = data_generator.flow_from_directory(
        directory="/final test 1 of 5/",
        save_to_dir="/image generator output/test/",
        target_size=(image_size, image_size),
        color_mode="grayscale",
        class_mode='categorical')

    model = Sequential()
    model.add(Conv2D(20, kernel_size=(3, 3),
                     activation='relu',
                     input_shape=(img_rows, img_cols, 1)))
    model.add(Conv2D(20, kernel_size=(3, 3), activation='relu'))
    model.add(Flatten())
    model.add(Dense(100, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(loss=keras.losses.categorical_crossentropy,
                  optimizer='adam',  # adam/sgd
                  metrics=['accuracy'])

    model.fit_generator(train_generator,
                        steps_per_epoch=3,
                        epochs=2,
                        validation_data=validation_generator,
                        validation_steps=1)

Seems like something silently fails and cripples the training process.

Upvotes: 0

Views: 656

Answers (2)

bitemybyte
bitemybyte

Reputation: 1089

As @today suggested the problem was in not normalized images.

Passing rescale=1/255 to ImageDataGenerator solved it.

Upvotes: -1

today
today

Reputation: 33410

The problem is that you are misunderstanding the steps_per_epoch argument of fit_generator. Let's take a look at the documentation:

steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of samples of your dataset divided by the batch size. Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.

So basically, it determines how many batches would be generated in each epoch. Since, by definition, an epoch means to go over the whole training data therefore we must set this argument to the total number of samples divided by batch size. So in your example it would be steps_per_epoch = 13268 // 10. Of course, as mentioned in the docs, you can leave it unspecified and it would automatically infer that.

Further, the same thing applies to validation_steps argument as well.

Upvotes: 4

Related Questions