ML Model Overfits if input data is normalized

Question

Please help me understand why my model overfits if my input data is normalized to [-0.5. 0.5] whereas it does not overfit otherwise.

I am solving a regression ML problem trying to detect location of 4 key points on images. To do that I import pretrained ResNet 50 and replace its top layer with the following architecture:

Flattening layer right after ResNet
Fully Connected (dense) layer with 256 nodes followed by LeakyRelu activation and Batch Normalization
Another Fully Connected layer with 128 nodes also followed by LeakyRelu and Batch Normalization
Last Fully connected layer (with 8 nodes) which give me 8 coordinates (4 Xs and 4 Ys) of 4 key points.

Since I stick with Keras framework, I use ImageDataGenerator to produce flow of data (images). Since output of my model (8 numbers: 2 coordinates for each out of 4 key points) normalized to [-0.5, 0.5] range, I decided that input to my model (images) should also be in this range and therefore normalized it to the same range using preprocessing_function in Keras' ImageDataGenerator.

Problem came out right after I started model training. I have frozen entire ResNet (training = False) with the goal in mind to first move gradients of the top layers to the proper degree and only then unfreeze a half of ResNet and finetune the model. When training with ResNet frozen, I noticed that my model suffers from overfitting right after a couple of epochs. Surprisingly, it happens even though my dataset is quite decent in size (25k images) and Batch Normalization is employed.

What's even more surprising, the problem completely disappears if I move away from input normalization to [-0.5, 0.5] and go with image preprocessing using tf.keras.applications.resnet50.preprocess_input. This preprocessing method DOES NOT normalize image data and surprisingly to me leads to proper model training without any overfitting.

I tried to use Dropout with different probabilities, L2 regularization. Also tried to reduce complexity of my model by reducing the number of top layers and the number of nodes in each top layer. I did play with learning rate and batch size. Nothing really helped if my input data is normalized and I have no idea why this happens.

IMPORTANT NOTE: when VGG is employed instead of ResNet everything seems to work well!

I really want to figure out why this happens.

UPD: the problem was caused by 2 reasons: - batch normalization layers within ResNet didn't work properly when frozen - image preprocessing for ResNet should be done using Z-score

After two fixes mentioned above, everything seems to work well!

ML Model Overfits if input data is normalized

Please help me understand why my model overfits if my input data is normalized to [-0.5. 0.5] whereas it does not overfit otherwise.

Answers (1)

Related Questions