user_12
user_12

Reputation: 2129

How to perform prediction using predict_generator on unlabeled test data in Keras?

I'm trying to build an image classification model. It's a 4 class image classification. Here is my code for building image generators and running the training:

train_datagen = ImageDataGenerator(rescale=1./255.,
                               rotation_range=30,
                               horizontal_flip=True,
                               validation_split=0.1)


train_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299), 
                                                class_mode='categorical', batch_size=20,
                                                subset='training')

validation_generator = image_gen.flow_from_directory(train_dir, target_size=(299, 299), 
                                                class_mode='categorical', batch_size=20,
                                                subset='validation')

model.compile(Adam(learning_rate=0.001), loss='categorical_crossentropy',
                                         metrics=['accuracy'])

model.fit_generator(train_generator, steps_per_epoch=int(440/20), epochs=20, 
                              validation_data=validation_generator, 
                              validation_steps=int(42/20))

I was able to get train and validation work perfectly because the images in train directory are stored in a separate folder for each class. But, as you can see below, the test directory has 100 images and no folders inside it. It also doesn't have any labels and only contains image files.

How can I do prediction on the image files in test folder using Keras?

dataset directory structure

Upvotes: 3

Views: 11739

Answers (2)

today
today

Reputation: 33420

If you are interested to only perform prediction, you can load the images by a simple hack like this:

test_datagen = ImageDataGenerator(rescale=1/255.)

test_generator = test_datagen('PATH_TO_DATASET_DIR/Dataset',
                              # only read images from `test` directory
                              classes=['test'],
                              # don't generate labels
                              class_mode=None,
                              # don't shuffle
                              shuffle=False,
                              # use same size as in training
                              target_size=(299, 299))

preds = model.predict_generator(test_generator)

You can access test_generator.filenames to get a list of corresponding filenames so that you can map them to their corresponding prediction.


Update (as requested in comments section): if you want to map predicted classes to filenames, first you must find the predicted classes. If your model is a classification model, then probably it has a softmax layer as the classifier. So the values in preds would be probabilities. Use np.argmax method to find the index with highest probability:

preds_cls_idx = preds.argmax(axis=-1)

So this gives you the indices of predicted classes. Now we need to map indices to their string labels (i.e. "car", "bike", etc.) which are provided by training generator in class_indices attribute:

import numpy as np

idx_to_cls = {v: k for k, v in train_generator.class_indices.items()}
preds_cls = np.vectorize(idx_to_cls.get)(preds_cls_idx)
filenames_to_cls = list(zip(test_generator.filenames, preds_cls))

Upvotes: 12

geekzeus
geekzeus

Reputation: 895

your folder structure be like testfolder/folderofallclassfiles

you can use

test_generator = test_datagen.flow_from_directory(
    directory=pred_dir,
    class_mode=None,
    shuffle=False
)

before prediction i would also use reset to avoid unwanted outputs

EDIT:

For your purpose you need to know which image is associated with which prediction. The problem is that the data-generator start at different positions in the dataset each time we use the generator, thus giving us different outputs everytime. So, in order to restart at the beginning of the dataset in each call to predict_generator() you would need to exactly match the number of iterations and batches to the dataset-size.
There are multiple ways to encounter this

a) You can see the internal batch-counter using batch_index of generator
b) create a new data-generator before each call to predict_generator()
c) there is a better and simpler way, which is to call reset() on the generator, and if you have set shuffle=False in flow_from_directory then it should start over from the beginning of the dataset and give the exact same output each time, so now the ordering of testgen.filenames and testgen.classes matches

test_generator.reset()

Prediction

prediction = model.predict_generator(test_generator,verbose=1,steps=numberofimages/batch_size)

To map the filename with prediction

predict_generator gives output in probabilities so at first we need to convert them to class number like 0,1..

predicted_class = np.argmax(prediction,axis=1)

next step would be to convert those class number into actual class names

l = dict((v,k) for k,v in training_set.class_indices.items())
prednames = [l[k] for k in predicted_classes]

getting filenames

filenames = test_generator.filenames

Finally creating df

finaldf = pd.DataFrame({'Filename': filenames,'Prediction': prednames})

Upvotes: 4

Related Questions