Reputation: 1940
I am training a CNN for Binary classification of images (15k samples each) using keras and tensorflow.
This is my model :
#input layer : first conv layer
model = Sequential()
model.add(Conv2D(filters=32,
kernel_size=(5,5),
input_shape=(256,256,3),
padding='same',
kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.1))
# second conv layer
model.add(Conv2D(filters=64,
kernel_size=(5,5),
padding='same',
kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))
# third layer
model.add(Conv2D(filters=128,
kernel_size=(5,5),
padding='same',
kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.3))
# fourth layer : FC layer
model.add(Flatten())
model.add(Dense(128,kernel_regularizer=regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.5))
# prediction layer
model.add(Dense(2,activation='softmax',name='prediction',kernel_regularizer=regularizers.l2(0.0001)))
I am using Adam (set to default values given in keras documentation) as optimiser. When I started training the model, it started behaving weirdly.
Epoch 14/180
191s - loss: 0.7426 - acc: 0.7976 - val_loss: 0.7306 - val_acc: 0.7739
Epoch 15/180
191s - loss: 0.7442 - acc: 0.8034 - val_loss: 0.7284 - val_acc: 0.8018
Epoch 16/180
192s - loss: 0.7439 - acc: 0.8187 - val_loss: 0.7516 - val_acc: 0.8103
Epoch 17/180
191s - loss: 0.7401 - acc: 0.8323 - val_loss: 0.7966 - val_acc: 0.7945
Epoch 18/180
192s - loss: 0.7451 - acc: 0.8392 - val_loss: 0.7601 - val_acc: 0.8328
Epoch 19/180
191s - loss: 0.7653 - acc: 0.8471 - val_loss: 0.7776 - val_acc: 0.8243
Epoch 20/180
191s - loss: 0.7514 - acc: 0.8553 - val_loss: 0.8367 - val_acc: 0.8170
Epoch 21/180
191s - loss: 0.7580 - acc: 0.8601 - val_loss: 0.8336 - val_acc: 0.8219
Epoch 22/180
192s - loss: 0.7639 - acc: 0.8676 - val_loss: 0.8226 - val_acc: 0.8438
Epoch 23/180
191s - loss: 0.7599 - acc: 0.8767 - val_loss: 0.8618 - val_acc: 0.8280
Epoch 24/180
191s - loss: 0.7632 - acc: 0.8761 - val_loss: 0.8367 - val_acc: 0.8426
Epoch 25/180
191s - loss: 0.7651 - acc: 0.8769 - val_loss: 0.8520 - val_acc: 0.8365
Epoch 26/180
191s - loss: 0.7713 - acc: 0.8815 - val_loss: 0.8770 - val_acc: 0.8316
and so on.....
Loss in increasing and the accuracy is also increasing (both training and validation).
As I am using a softmax classifier it is logical to get the starting loss ~0.69 (-ln(0.5)) but here the loss is higher than that.
I am confused whether this is over-fitting or not. Can anyone tell me what is happening here?
Upvotes: 1
Views: 1393
Reputation: 8092
You show the data for epoch 14 and higher. For prior epochs did your loss decrease monotonically? If it did then the behavior for these higher epochs is not that unusual particularly if you are not using an adjustable learning rate. It is not uncommon that a loss can go up but accuracy also increase. They are calculated by entirely different methods.Try using the Keras built in learning rate adjuster provided as keras.callbacks.callbacks.ReduceLROnPlateau. This will reduce your learning rate over epochs based on what metric you choose to monitor. You can think the loss function as a valley in N space which narrows as you approach the minimum (see attached figure) If you have too large a learning rate as you approach the minimum (see arrows in figure) your loss will no longer decrease monotonically but actually start to rise.Loss Function
Upvotes: 0
Reputation: 612
First of all use binary crossentropy for binary classiication, second you need to tune the learning rate, I think that the value of you learning rate is to big.
P.S. If you can tell us what are the images you are using it would be helpful.
Upvotes: 0
Reputation: 10174
For a binary classification you could try to change to this for your prediction layer:
model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Upvotes: 5