Reputation: 897
I am working on an image classification problem using TensorFlow.
Dataset: I found a dataset online consisting of 5 classes. However I am interested in having the images classified into classes A and B as a binary classification problem. So the distribution of images is as follows - Class A (Contains images of Class 1 out of the 5 classes) and Class B (Contains images of other 4 classes out of the 5 classes). The images distribution among the classes is as follows:
Class A - Contains 699 images of Class 1
Class B - Contains the images of the other 4 Classes.
300 images of Class 2
300 images of Class 3
399 images of Class 4
273 images of Class 5
Model: I am using the Inception_v2-Resnet headless model from Tensorflow Hub to retrain and fine tune the model with our dataset.
Results: Earlier I trained it such that the Class B had images of Class 4 and Class 5 but did not have any images of Class 2 and Class 3. With such a dataset, I achieved an accuracy of around 92%. After adding the 300 images each for Class 2 and Class 3 into the Class B, the accuracy falls to around 70% with the same model.
I would appreciate it if someone here could offer suggestions about what needs to be done to improve the model accuracy.
Upvotes: 0
Views: 117
Reputation: 8092
Initially when you have class B only include Class 4 and 5 you have 672 images in class B and 699 in class A so the data set is close to being balanced. When you add the 300 images from class 2 and 3 you have now got 1272 images in class B and 699 images in class A so the data set is unbalanced. One thing you could do is to artificially increase the number of images in class A. You can use the Keras ImageDataGenerator.flow method operating on your class A images to create more images for class A. Documentation is here.Set the parameters of flow to save the augmented images to your class A directory. You can use the parameters to create modified images of the class A images such as Horizontal flip, etc. Alternatively although I have not tried it I believe in the model.fit method you can assign weights to the training samples. This enables you to compensate for the imbalance in the dataset.
Upvotes: 2