Reputation: 2129
I'm trying to build an image classification model. It's a 4 class image classification. Here is my code for building image generators and running the training:
train_datagen = ImageDataGenerator(rescale=1./255.,
rotation_range=30,
horizontal_flip=True,
validation_split=0.1)
train_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='training')
validation_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299),
class_mode='categorical', batch_size=20,
subset='validation')
model.compile(Adam(learning_rate=0.001), loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit_generator(train_generator, steps_per_epoch=int(440/20), epochs=20,
validation_data=validation_generator,
validation_steps=int(42/20))
I was able to get train and validation work perfectly because the images in train directory are stored in a separate folder for each class. But, as you can see below, the test directory has 100 images and no folders inside it. It also doesn't have any labels and only contains image files.
How can I do prediction on the image files in test folder using Keras?
Upvotes: 3
Views: 11739
Reputation: 33420
If you are interested to only perform prediction, you can load the images by a simple hack like this:
test_datagen = ImageDataGenerator(rescale=1/255.)
test_generator = test_datagen('PATH_TO_DATASET_DIR/Dataset',
# only read images from `test` directory
classes=['test'],
# don't generate labels
class_mode=None,
# don't shuffle
shuffle=False,
# use same size as in training
target_size=(299, 299))
preds = model.predict_generator(test_generator)
You can access test_generator.filenames
to get a list of corresponding filenames so that you can map them to their corresponding prediction.
Update (as requested in comments section): if you want to map predicted classes to filenames, first you must find the predicted classes. If your model is a classification model, then probably it has a softmax layer as the classifier. So the values in preds
would be probabilities. Use np.argmax
method to find the index with highest probability:
preds_cls_idx = preds.argmax(axis=-1)
So this gives you the indices of predicted classes. Now we need to map indices to their string labels (i.e. "car", "bike", etc.) which are provided by training generator in class_indices
attribute:
import numpy as np
idx_to_cls = {v: k for k, v in train_generator.class_indices.items()}
preds_cls = np.vectorize(idx_to_cls.get)(preds_cls_idx)
filenames_to_cls = list(zip(test_generator.filenames, preds_cls))
Upvotes: 12
Reputation: 895
your folder structure be like testfolder/folderofallclassfiles
you can use
test_generator = test_datagen.flow_from_directory(
directory=pred_dir,
class_mode=None,
shuffle=False
)
before prediction i would also use reset to avoid unwanted outputs
EDIT:
For your purpose you need to know which image is associated with which prediction. The problem is that the data-generator start at different positions in the dataset each time we use the generator, thus giving us different outputs everytime. So, in order to restart at the beginning of the dataset in each call to predict_generator()
you would need to exactly match the number of iterations and batches to the dataset-size.
There are multiple ways to encounter this
a) You can see the internal batch-counter using batch_index
of generator
b) create a new data-generator before each call to predict_generator()
c) there is a better and simpler way, which is to call reset()
on the generator, and if you have set shuffle=False
in flow_from_directory
then it should start over from the beginning of the dataset and give the exact same output each time, so now the ordering of testgen.filenames
and testgen.classes
matches
test_generator.reset()
Prediction
prediction = model.predict_generator(test_generator,verbose=1,steps=numberofimages/batch_size)
To map the filename with prediction
predict_generator
gives output in probabilities so at first we need to convert them to class number like 0,1..
predicted_class = np.argmax(prediction,axis=1)
next step would be to convert those class number into actual class names
l = dict((v,k) for k,v in training_set.class_indices.items())
prednames = [l[k] for k in predicted_classes]
getting filenames
filenames = test_generator.filenames
Finally creating df
finaldf = pd.DataFrame({'Filename': filenames,'Prediction': prednames})
Upvotes: 4