Theoretical questions about layers in dnn with batchnormalization using keras

I have some troubles to understand the models of DNN using batchnormalization, in specifique using keras. Can somebody explaind me the structure and content of each layer in this model that I built?

modelbatch = Sequential()
modelbatch.add(Dense(512, input_dim=1120))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(256))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(num_classes))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('softmax'))
# Compile model
modelbatch.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = modelbatch.fit(X_2, y_2, batch_size=500, \
                         epochs=20, verbose=2, validation_data=(X_test, y_test))
end = time.time()

This is, i think, all the layers of my model:

print(modelbatch.layers[0].get_weights()[0].shape)
(1120, 512)
print(modelbatch.layers[0].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[0].shape)
(512,)
print(modelbatch.layers[1].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[2].shape)
(512,)
print(modelbatch.layers[1].get_weights()[3].shape)
(512,)
print(modelbatch.layers[4].get_weights()[0].shape)
(512, 256)
print(modelbatch.layers[4].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[0].shape)
(256,)
print(modelbatch.layers[5].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[2].shape)
(256,)
print(modelbatch.layers[5].get_weights()[3].shape)
(256,)
print(modelbatch.layers[8].get_weights()[0].shape)
(256, 38)
print(modelbatch.layers[8].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[0].shape)
(38,)
print(modelbatch.layers[9].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[2].shape)
(38,)
print(modelbatch.layers[9].get_weights()[3].shape)
(38,)

I will appreciate your help, thanks in advance.

Upvotes: 1

Views: 120

Answers (1)

Coding thermodynamist
Coding thermodynamist

Reputation: 1403

Let's go through your model:

You have your input layer with dimension 1120, connected to that one, you have your first hidden layer with 512 neurons, after you have your batch normalization layer. After that your activation function and after that your dropout layer. Note that you can use the command model.summary() to visualize your model

In theory, you can (and should), consider these layer only like one layer on which you apply the following transformation: batch-normalization, activation and dropout. In practice, each layer is implemented individually in Keras because you gain in modularity for the implementation: instead of coding all the possible way a layer can be designed, the user can choose to add to the layer batch norm or dropout. To look at the modular implementation, I recommend you to have a look at http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf and in general at http://cs231n.stanford.edu/syllabus.html if you want to gain deeper knowledge.

For the batch-normalization layer, you have as you can notice 4 parameters: two adjustable parameters: gamma and beta, and two parameters that are set by the data (the mean and the standard deviation). To learn what it is, look at the Stanford class, you can also find it in the original paper about batch normalization https://arxiv.org/abs/1502.03167. It is just a trick to improve learning speed and improve accuracy by normalizing your data at each layer, like you would do it in a preprocessing step for your input data.

From what I said, you can infer the rest of your model.

N-B: I wouldn't use the batchnormalization layer in the last step before the softmax.

Is it clearer ?

Upvotes: 1

Related Questions