Reputation: 115
I have built and tested two convolutional Neural Network models (VGG-16 and 3-layer CNN
) to predict classification of lung CT scans for COVID-19.
Prior to the classification, I've performed image segmentation via k-means clustering
on images to try to improve the classification performance.
The segmented images look like below.
And I've trained and evaluated VGG-16 model on both segmented images and raw images separately. And lastly, trained and evaluated a 3-layer CNN on the segmented images only. Below is the results for their train/validation loss and accuracy.
For the simple 3-layer CNN model, I can clearly see that the model is trained well and also it starts to overfit once epochs are over 2. But, I don't understand how validation accuracy of the VGG model doesn't look like an exponential curve instead it looks like a horizontally straight line or a fluctuating horizontal line. And besides, the simple 3-layer CNN models seems to perform better. Is this due to gradient vanishing in VGG model ? Or the image itself is simple that deep architecture doesn't benefit? I'd appreciate if you could share your knowledge on such learning behaviour of the models.
This is the code for the VGG-16 model:
# build model
img_height = 256
img_width = 256
model = Sequential()
model.add(Conv2D(input_shape=(img_height,img_width,1),filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=1, activation="sigmoid"))
opt = Adam(lr=0.001)
model.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
And this is a code for the 3-layer CNN.
# build model
model2 = Sequential()
model2.add(Conv2D(32, 3, padding='same', activation='relu',input_shape=(img_height, img_width, 1)))
model2.add(MaxPool2D())
model2.add(Conv2D(64, 5, padding='same', activation='relu'))
model2.add(MaxPool2D())
model2.add(Flatten())
model2.add(Dense(128, activation='relu'))
model2.add(Dense(1, activation='sigmoid'))
opt = Adam(lr=0.001)
model2.compile(optimizer=opt, loss=keras.losses.binary_crossentropy, metrics=['accuracy'])
Thank you!
Upvotes: 2
Views: 1438
Reputation: 115
As per what @CAFEBABE suggested, I have tried two approaches. First, I have increased epochs size to 200, changed optimiser to SGD and reduced learning rate down to 1e-5. And second, I have implemented pre-trained weights for the VGG-16 model and only trained the last two convolutional layers. Below is the plot displaying the tuned VGG-16 model, the pre-trained VGG-16 model and the 3-layer CNN model (from top to bottom).
Certainly, tuning had an effect on the performance but it was very marginal. I guess the learnable features from the dataset with ~600 images were not sufficient enough to train the model. And the pre-trained model significantly benefitted the model reaching overfitting at ~25 epochs. However, in comparion with the 3-layer CNN model, the testing accuracies of these two models are similar ranging between 0.7 and 0.8. I guess this is again due to the limitation of the datasets.
Thanks again to @CAFEBABE for helping my problem and I hope this can help other people who might face similar problem as I did.
Upvotes: 1
Reputation: 4101
Looking at the accuracies for an assumed to be binary problem you can observe that the model is just random guessing (acc ~ 0.5). The fact that your 3-layer model gives much better results on the train set indicates that you are not training long enough to overfit. In addition you do not seem to use a proper initalization of the NN. Note: at the beginning of an implementation process overfitting is indicating that implementation training just works fine. Hence it is a good thing in this phase. Therefore, first step would be to get the model overfitting. You seem to train from scratch. In that case it can take a few 100 epochs until the gradients impact the first convolutions on a complex model like VGG16.
As the 3Layer CNN seems to overfit quite heavily I conclude that your dataset is rather small. Hence, I would recommend to start from a pre-trained model (VGG16) and just re-train the last two layers. This should give much better result.
Upvotes: 2