alyssaeliyah
alyssaeliyah

Reputation: 2244

Python Keras - Custom Labels in ImageDataGenerator

I am currently creating a CNN model that classifies whether the font is Arial, Verdana, Times New Roman and Georgia. All in all there are 16 classes since I considered also detecting whether the font is regular, bold, italics or bold italics. So 4 fonts * 4 styles = 16 classes.

The data that I have used in my training are the following:

Training data set : 800 image patches of 256 * 256 dimension (50 for each class)
Validation data set : 320 image patches of 256 * 256 dimension (20 for each class)
Testing data set : 160 image patches of 256 * 256 dimension (10 for each class)

Below is my initial code:

import numpy as np
import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation
from keras.layers.core import Dense, Flatten
from keras.optimizers import Adam
from keras.metrics import categorical_crossentropy
from keras.preprocessing.image import ImageDataGenerator
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import *
from matplotlib import pyplot as plt
import itertools
import matplotlib.pyplot as plt
import pickle


 image_width = 256
 image_height = 256

 train_path = 'font_model_data/train'
 valid_path =  'font_model_data/valid'
 test_path = 'font_model_data/test'


  train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(image_width, image_height), classes=['1','2','3','4', '5', '6', '7', '8', '9', '10', '11', '12','13', '14', '15', '16'], batch_size = 16)
 valid_batches = ImageDataGenerator().flow_from_directory(valid_path, target_size=(image_width, image_height), classes=['1','2','3','4', '5', '6', '7', '8', '9', '10', '11', '12','13', '14', '15', '16'], batch_size = 16)
 test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(image_width, 
 image_height), classes=['1','2','3','4', '5', '6', '7', '8', '9', '10', '11', '12','13', '14', '15', '16'], batch_size = 160)


 imgs, labels = next(train_batches)
 print(labels)

#CNN model
model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(image_width, image_height, 3)),
    Flatten(),
    Dense(**16**, activation='softmax'), # I want to make it 4
])

I'm planning to have 4 output nodes in the network:

4 Output Nodes (4 bits):
Class 01 - 0000
Class 02 - 0001
Class 03 - 0010
Class 04 - 0011
Class 05 - 0100
Class 06 - 0101
Class 07 - 0110
Class 08 - 0111
Class 09 - 1000
Class 10 - 1001
Class 11 - 1010
Class 12 - 1011
Class 13 - 1100
Class 14 - 1101
Class 15 - 1110
Class 16 - 1111

But the labels generated by ImageDataGenerator is a 16 bits label

[[0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

How will I assign a custom label for my classes? I want my labels to be :

 labels = [[0,0,0,0],
 [0,0,0,1],
 [0,0,1,0],
 [0,0,1,1],
 [0,1,0,0],
 [0,1,0,1],
 [0,1,1,0],
 [0,1,1,1],
 [1,0,0,0],
 [1,0,0,1],
 [1,0,1,0],
 [1,0,1,1],
 [1,1,0,0],
 [1,1,0,1],
 [1,1,1,0],
 [1,1,1,1]]

The purpose of it is to make the output nodes of my network/ last dense layer from 16 to 4 nodes, thus, less complicated architecture.

Upvotes: 2

Views: 669

Answers (2)

Vito
Vito

Reputation: 426

My answer is no way to do this. You can't turn the output layer to just 4 units as you have 16 classes to classify. If you want to simplify your network, just try to reduce the number of hidden units to see the network is under fit or over fit. If it's overfit, you can reduce the complexity of your network. If it's under fit. It suggests that you need more complex network. If you really want to reduce the number of units in output layer. Maybe one way I can think of is that you can first classify what font is. This needs 4 units. Then detect what style is. This needs other 4 units. Then combine the probabilities of two outputs. I'm not sure whether this will work. You can try it.

Upvotes: 1

Nicolas Gervais
Nicolas Gervais

Reputation: 36624

This is what you already have:

custom_labels = ['0000',  '0001', '0010', '0011', '0100', '0101', '0110', '0111',
                 '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111']

output = [[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            [0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            [1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
            [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
            [0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            [0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
            [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]

You'll need to get the index of the 1:

import numpy as np

column = np.argmax(output, axis=1)
Out[10]: array([ 1,  2,  3,  0,  2,  9,  7,  4,  2, 15], dtype=int64)

With this, you can select the corresponding custom label:

array(['0001', '0010', '0011', '0000', '0010', '1001', '0111', '0100',
       '0010', '1111'], dtype='<U4')

But what you want is a list of these, but as separate integers:

final_labels = np.array([list(i) for i in np.array(custom_labels)[column]]).astype(int)
array([[0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 1, 1],
       [0, 0, 0, 0],
       [0, 0, 1, 0],
       [1, 0, 0, 1],
       [0, 1, 1, 1],
       [0, 1, 0, 0],
       [0, 0, 1, 0],
       [1, 1, 1, 1]])

Upvotes: 1

Related Questions