Reputation: 1032
For most datasets, images are independent and there's no issue with splitting them randomly 80%-20% into train/
and test/
directories for use with Keras' flow_from_directory()
. However, for my application this is not the case. For example, let's say I want to classify whether people are smiling or frowning. Instead of using thousands of images found online of random people smiling and frowning, I have recruited 10 volunteers and taken 100 images of each volunteer smiling and frowning. In my eventual application, I want to classify whether a new user is smiling or frowning. For a fair test, I must ensure that no images of users in my test set appear in the training set (otherwise my classifier may pick up on features which are specific to that user, which I don't want), so I leave out one user and train my model on the nine others. My directory structure looks like:
user1/
smile/
100 images
frown/
100 images
...
user10/
smile/
100 images
frown/
100 images
Is there any way to feed Keras user1/
as the test/
directory and user2/
through user10/
as the train/
directories?
Note: My question is not a duplicate of this question because that concerns feeding in multiple directories in parallel for use with a single training example. My question is similar to this, but that question is so poorly written that I'm not sure if the user is asking the same question I am.
Upvotes: 3
Views: 3563
Reputation: 419
@TimD1 I believe if you change the way your directories are structures slightly as shown below you can use flow_from_directory in keras.
Test_Directory/
User1/
200 images here(don't create separate folders for smile and frown here)
Train_Directory/
Smile/
All the images for smile for users 2-10
Frown/
All the images for frown for users 2-10
Once you have this directory structure you can use the following code and change the particulars as needed for your application. Important things are the path to directory and if you want to create a validation set or not.
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.1,
zoom_range = 0.1,
validation_split=0.1
) # validation set of 10% from training data
test_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.1,
zoom_range = 0.1,
)
training_set = train_datagen.flow_from_directory('desktop/Train_Directory',target_size = (64,64),shuffle=True,
seed=13,batch_size = 32,class_mode = 'binary',
subset="training")
val_set = train_datagen.flow_from_directory('desktop/Train_Directory',target_size = (64,64),shuffle=True,
seed=13,batch_size = 32,class_mode = 'categorical',
subset="validation")
test_set= test_datagen.flow_from_directory('desktop/Test_Directory',target_size = (input_size,input_size),shuffle=False,
seed=13,class_mode=None,batch_size = 1)# for test the batch size should be set to 1 and the shuffle should be false to get the correct number of outputs in the right order your predicting the test labels
After this point use the fit_generator
to train and predict_generator
to test. If you choose to set shuffle
to True for test_set
then you need to do test_set.reset()
before predicting test label
Upvotes: 1