TimD1
TimD1

Reputation: 1032

Keras: Using multiple directories in flow_from_directory()

For most datasets, images are independent and there's no issue with splitting them randomly 80%-20% into train/ and test/ directories for use with Keras' flow_from_directory(). However, for my application this is not the case. For example, let's say I want to classify whether people are smiling or frowning. Instead of using thousands of images found online of random people smiling and frowning, I have recruited 10 volunteers and taken 100 images of each volunteer smiling and frowning. In my eventual application, I want to classify whether a new user is smiling or frowning. For a fair test, I must ensure that no images of users in my test set appear in the training set (otherwise my classifier may pick up on features which are specific to that user, which I don't want), so I leave out one user and train my model on the nine others. My directory structure looks like:

user1/
    smile/
        100 images
    frown/
        100 images
...
user10/
    smile/
        100 images
    frown/
        100 images

Is there any way to feed Keras user1/ as the test/ directory and user2/ through user10/ as the train/ directories?

Note: My question is not a duplicate of this question because that concerns feeding in multiple directories in parallel for use with a single training example. My question is similar to this, but that question is so poorly written that I'm not sure if the user is asking the same question I am.

Upvotes: 3

Views: 3563

Answers (1)

Aditya Lahiri
Aditya Lahiri

Reputation: 419

@TimD1 I believe if you change the way your directories are structures slightly as shown below you can use flow_from_directory in keras.

Test_Directory/
             User1/
                  200 images here(don't create separate folders for smile and frown here) 

Train_Directory/
               Smile/
                     All the images for smile for users 2-10 
               Frown/
                     All the images for frown for users 2-10    

Once you have this directory structure you can use the following code and change the particulars as needed for your application. Important things are the path to directory and if you want to create a validation set or not.

from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale = 1./255,
                               shear_range = 0.1,
                               zoom_range = 0.1,
                               validation_split=0.1 
                               ) # validation set of 10% from training data

test_datagen = ImageDataGenerator(rescale = 1./255,
                               shear_range = 0.1,
                               zoom_range = 0.1,
                               )

training_set = train_datagen.flow_from_directory('desktop/Train_Directory',target_size = (64,64),shuffle=True,
                                             seed=13,batch_size = 32,class_mode = 'binary',
                                             subset="training")

val_set = train_datagen.flow_from_directory('desktop/Train_Directory',target_size = (64,64),shuffle=True,
                                             seed=13,batch_size = 32,class_mode = 'categorical',
                                             subset="validation")

test_set= test_datagen.flow_from_directory('desktop/Test_Directory',target_size = (input_size,input_size),shuffle=False,
                                             seed=13,class_mode=None,batch_size = 1)# for test the batch size should be set to 1 and the shuffle should be false to get the correct number of outputs in the right order your predicting the test labels

After this point use the fit_generator to train and predict_generator to test. If you choose to set shuffle to True for test_set then you need to do test_set.reset() before predicting test label

Upvotes: 1

Related Questions