Vladushka
Vladushka

Reputation: 35

datagen.flow_from_directory function

I am using fer2013 dataset, and when I am using datagen.flow_from_directory function, it doesn't find all the images from the directory.

Here's my code

IMAGE_SIZE = 224
BATCH_SIZE = 64

train_data_dir = "/content/drive/My Drive/Colab/FER2013/Training"
validation_data_dir = "/content/drive/My Drive/Colab/FER2013/PublicTest"


datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255, 
    validation_split=0.2)

train_generator = datagen.flow_from_directory(
    train_data_dir,
    target_size=(IMAGE_SIZE, IMAGE_SIZE),
    batch_size=BATCH_SIZE, 
    subset='training')

val_generator = datagen.flow_from_directory(
    validation_data_dir,
    target_size=(IMAGE_SIZE, IMAGE_SIZE),
    batch_size=BATCH_SIZE, 
    subset='validation')

Here's the result.


Found 22921 images belonging to 7 classes.
Found 714 images belonging to 7 classes.

I don't have an err per se, but in the directory folder I have 28000+ images and in PublicTest 3000+ so, why it finds me only 22921 and 714 insted of my actual number of images?

Upvotes: 1

Views: 3223

Answers (1)

Gerry P
Gerry P

Reputation: 8112

Apparently you have a separate directory for training and a separate directory for validation images. Each should have 7 sub directories one for each class and named identically in training and validation directories. In the data generator you set the validation_split=0.2. This is going to take your training images and dedicate 80% of them to training and 20% to validation. So roughly 28000 X .8 = 22400. Since you have a separate validation directory already you should set the split=0. That way all the images in the training directory will be used for training. With the validation_split=0 you do not need to specify subset in the flow_from_directory methods. Feed both the the train_generator and val_generator into model.fit.

Upvotes: 1

Related Questions