Tobitor
Tobitor

Reputation: 1508

Configuration of CNN model for recognition of sequential data - Architecture of the top of the CNN - Parallel Layers

I am trying to configure a network for character recognition of sequential data like license plates. Now I would like to use the architecture which is noted in Table 3 in Deep Automatic Licence Plate Recognition system (link: http://www.ee.iisc.ac.in/people/faculty/soma.biswas/Papers/jain_icgvip2016_alpr.pdf).

The architecture the authors presented is this one:

CNN

The first layers are very common, but where I was stumbling was the top (the part in the red frame) of the architecture. They mention 11 parallel layers and I am really unsure how to get this in Python. I coded this architecture but it does not seem to be right to me.

model = Sequential()
model.add(Conv2D(64, kernel_size=(5, 5), input_shape = (32, 96, 3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, kernel_size=(3, 3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(256, kernel_size=(3, 3), activation = "relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024, activation = "relu"))
model.add(Dense(11*37, activation="Softmax"))
model.add(keras.layers.Reshape((11, 37)))

Could someone help? How do I have to code the top to get an equal architecture like the authors?

Upvotes: 1

Views: 208

Answers (2)

Dinesh Sathia Raj
Dinesh Sathia Raj

Reputation: 381

The code below can build the architecture described in the image.

import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, Flatten, MaxPooling2D, Dense, Input, Reshape, Concatenate, Dropout

def create_model(input_shape = (32, 96, 1)):
    input_img = Input(shape=input_shape)
    '''
    Add the ST Layer here.
    '''
    model = Conv2D(64, kernel_size=(5, 5), input_shape = input_shape, activation = "relu")(input_img)
    model = MaxPooling2D(pool_size=(2, 2))(model)
    model = Dropout(0.25)(model)

    model = Conv2D(128, kernel_size=(3, 3), input_shape = input_shape, activation = "relu")(model)
    model = MaxPooling2D(pool_size=(2, 2))(model)
    model = Dropout(0.25)(model)

    model = Conv2D(256, kernel_size=(3, 3), input_shape = input_shape, activation = "relu")(model)
    model = MaxPooling2D(pool_size=(2, 2))(model)
    model = Dropout(0.25)(model)

    model = Flatten()(model)
    backbone = Dense(1024, activation="relu")(model)

    branches = []
    for i in range(11):
        branches.append(backbone)
        branches[i] = Dense(37, activation = "softmax", name="branch_"+str(i))(branches[i])
    
    output = Concatenate(axis=1)(branches)
    output = Reshape((11, 37))(output)
    model = Model(input_img, output)

    return model

Model Architecture

Upvotes: 2

Tristan Nemoz
Tristan Nemoz

Reputation: 2048

From my understanding, your implementation is almost correct. The authors train 11 individual classifiers taking as input the output from the Fully Connected Layer. Here, you can think of "parallel" as "independent".

However, you cannot apply the Softmax activation right after the Fully Connected Layer. Since all the classifiers are independent, we want each of them to output a probability for each possible character. Putting things differently, we want the sum of the outputs of each classifier to be 1. Hence, the correct implementation would be:

...
model.add(Dense(1024, activation = "relu"))
# Feeding every neuron with the previous layer's output
model.add(Dense(11*37))
model.add(keras.layers.Reshape((11, 37)))
model.add(keras.activations.softmax(x, axis=1))

Upvotes: 1

Related Questions