Abhijit Balaji
Abhijit Balaji

Reputation: 1940

training loss increases while validation accuracy increases

I am training a CNN for Binary classification of images (15k samples each) using keras and tensorflow.

This is my model :

#input layer : first conv layer
model = Sequential()
model.add(Conv2D(filters=32,
                 kernel_size=(5,5),
                 input_shape=(256,256,3),
                 padding='same',
                 kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.1))

# second conv layer
model.add(Conv2D(filters=64,
                 kernel_size=(5,5),
                 padding='same',
                 kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# third layer
model.add(Conv2D(filters=128,
                 kernel_size=(5,5),
                 padding='same',
                 kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.3))
# fourth layer : FC layer
model.add(Flatten())
model.add(Dense(128,kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
# prediction layer
model.add(Dense(2,activation='softmax',name='prediction',kernel_regularizer=regularizers.l2(0.0001)))
    

I am using Adam (set to default values given in keras documentation) as optimiser. When I started training the model, it started behaving weirdly.

Epoch 14/180
191s - loss: 0.7426 - acc: 0.7976 - val_loss: 0.7306 - val_acc: 0.7739

Epoch 15/180
191s - loss: 0.7442 - acc: 0.8034 - val_loss: 0.7284 - val_acc: 0.8018

Epoch 16/180
192s - loss: 0.7439 - acc: 0.8187 - val_loss: 0.7516 - val_acc: 0.8103

Epoch 17/180
191s - loss: 0.7401 - acc: 0.8323 - val_loss: 0.7966 - val_acc: 0.7945

Epoch 18/180
192s - loss: 0.7451 - acc: 0.8392 - val_loss: 0.7601 - val_acc: 0.8328

Epoch 19/180
191s - loss: 0.7653 - acc: 0.8471 - val_loss: 0.7776 - val_acc: 0.8243

Epoch 20/180
191s - loss: 0.7514 - acc: 0.8553 - val_loss: 0.8367 - val_acc: 0.8170

Epoch 21/180
191s - loss: 0.7580 - acc: 0.8601 - val_loss: 0.8336 - val_acc: 0.8219

Epoch 22/180
192s - loss: 0.7639 - acc: 0.8676 - val_loss: 0.8226 - val_acc: 0.8438

Epoch 23/180
191s - loss: 0.7599 - acc: 0.8767 - val_loss: 0.8618 - val_acc: 0.8280

Epoch 24/180
191s - loss: 0.7632 - acc: 0.8761 - val_loss: 0.8367 - val_acc: 0.8426

Epoch 25/180
191s - loss: 0.7651 - acc: 0.8769 - val_loss: 0.8520 - val_acc: 0.8365

Epoch 26/180
191s - loss: 0.7713 - acc: 0.8815 - val_loss: 0.8770 - val_acc: 0.8316

and so on.....

Loss in increasing and the accuracy is also increasing (both training and validation).

As I am using a softmax classifier it is logical to get the starting loss ~0.69 (-ln(0.5)) but here the loss is higher than that.

I am confused whether this is over-fitting or not. Can anyone tell me what is happening here?

Upvotes: 1

Views: 1393

Answers (3)

Gerry P
Gerry P

Reputation: 8092

You show the data for epoch 14 and higher. For prior epochs did your loss decrease monotonically? If it did then the behavior for these higher epochs is not that unusual particularly if you are not using an adjustable learning rate. It is not uncommon that a loss can go up but accuracy also increase. They are calculated by entirely different methods.Try using the Keras built in learning rate adjuster provided as keras.callbacks.callbacks.ReduceLROnPlateau. This will reduce your learning rate over epochs based on what metric you choose to monitor. You can think the loss function as a valley in N space which narrows as you approach the minimum (see attached figure) If you have too large a learning rate as you approach the minimum (see arrows in figure) your loss will no longer decrease monotonically but actually start to rise.Loss Function

Upvotes: 0

hola
hola

Reputation: 612

First of all use binary crossentropy for binary classiication, second you need to tune the learning rate, I think that the value of you learning rate is to big.

P.S. If you can tell us what are the images you are using it would be helpful.

Upvotes: 0

petezurich
petezurich

Reputation: 10174

For a binary classification you could try to change to this for your prediction layer:

model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Upvotes: 5

Related Questions