Mattia Surricchio
Mattia Surricchio

Reputation: 1608

Batch normalization destroys validation performances

I'm adding some batch normalization to my model in order to improve the training time, following some tutorials. This is my model:

model = Sequential()

model.add(Conv2D(16, kernel_size=(3, 3), activation='relu', input_shape=(64,64,3)))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(256, kernel_size=(3, 3), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Flatten())

model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))

#NB: adding more parameters increases the probability of overfitting!! Try to cut instead of adding neurons!! 
model.add(Dense(units=512, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(units=20, activation='softmax'))

Without batch normalization, i get around 50% accuracy on my data. Adding batch normalization destroys my performance, with a validation accuracy reduced to 10%.

enter image description here

Why is this happening?

Upvotes: 6

Views: 5732

Answers (2)

Kedaar Rao
Kedaar Rao

Reputation: 127

Try using lesser number of batch normalization layers. And it is a general practice to use it at the last convolution layer. Start with just one of them and add more if it improves the validation accuracy.

Upvotes: 1

learningthemachine
learningthemachine

Reputation: 614

I'm not sure if this is what you are asking, but batch normalization is still active during validation, it's just that the parameters are defined and set during training and not altered during validation.

As for why batch normalization is not good for your model/problem in general, it's like any hyper parameter, some work well with some scenarios, not well with others. Do you know if this is the best placement for BN within your network? Other than that would need to know more about your data and problem to give any further guesses.

Upvotes: 2

Related Questions