Reputation: 8112
I ran and trained a model that takes image inputs and classifies them into one of two categories. Training converged well with about 92% accuracy on the test set. I saved the model and created a new python program that loads the model then makes predictions on the same test set. It gave the same results as the previous program. I became curious about the few classification errors that were made. I have my tests directories arrange as Test/ClassA and Test/ClassB.Each directory contains 30 images labelled for convenience as 1.jpg through 30.jpg. In the prediction results three images were classified incorrectly. I then took an image that was classified correctly and placed it into the three files that had the errors. I then reran the test files expecting the three errors would be gone but they are still there. I suspect the ImageDataGenerator is doing something strange with the file ordering but I do not know what. The relevant code is shown below.
test_path ='c:/Temp/people/test' #path to test directory
test_batch_size=30 # 30 felon images and 30 nonfelon images
test_batches = ImageDataGenerator(preprocessing_function=keras.applications.mobilenet.preprocess_input
).flow_from_directory(test_path, target_size=(224,224), batch_size=test_batch_size, shuffle=False)
test_labels=test_batches.classes
y=test_batches.class_indices
step_size=len(y) # step size is 2 for 2 classes 30 images each
model=load_model('c:/Temp/people/felon_classifier.h5')
predictions= model.predict_generator(test_batches, steps=step_size, verbose=0)
The remaining code simply formats the prediction output so each prediction is printed out with the class (ClassA or ClassB) and the probability.
Anyone know what I could be doing wrong? Any help would be most appreciated. Given this problem I am not sure I can trust the prediction results.
Upvotes: 0
Views: 224
Reputation: 8112
I found the solution to the misclassification. It is suttle and has to do with the naming of the files in the test directories. Mine were labelled as 1.jpg 2.jpg etc up to 30.jpg. The windows os places them in the order 1.jpg 2.jpg 3.jpg up to 30.jpg. What I discovered is that the ImageDataGenerator does NOT read files in the same order. It will place them in the following order 1.jpg 10.jpg, 2.jpg, 20.jpg, 3,jpg, 30,jpg, 4.jpg, 5,jpg etc To avoid this problem for file names with single digits pad the files name with a 0 as shown 01.jpg 02.jpg 03.jpg up to 09.jpg then 10.jpg up to 30.jpg. When you do that it insures that the os and ImageDataGenerator read the files in the same order. If the number of files you have exceeds say 100 then you would have to rename your files as 001.jpg,002jpg up to ..009.jpg, then for the two digit files names the file naming would be 010.jpg, 020.jpg etc. After making those changes all now works as it should. As I said a simple fix but a very suttle problem. I wrote a small python function thatdetrmines the number of files in a directory and renumbers them with the required 0's padding based on how many files are in the directory. Handy if you have a lot of files to rename.
Upvotes: 1