Nagabhushan S N
Nagabhushan S N

Reputation: 7267

What is the role of preprocess_input() function in Keras VGG model?

This question is kind of a follow up for the discussion in comments of this answer.

From what I understand, the preprocess_input() function does mean subtraction and std-dev dvision for the input images. The mean are those that are computed on ImageNet-1K database when training VGG.

But this answer says that when using VGG features as a loss function, preprocess_input() is not required and we just need to normalize the image to [0,1] range before passing to VGG. This confuses me...

  1. If we don't preprocess, then the input will be in different range compared to those images used to train VGG. How are the VGG features still valid?
  2. From what I understand from this answer, we should have images in [0,255] range and preprocess_input() function takes care of the normalization and all. From the source code, I understand that for caffe models, normalization to [0,1] range is not done. Instead mean is subtracted and std-dev is divided. How would just normalizing network output to [0,1] range as suggested in the comments of this answer achieve the same?

Edit 1:
I'm considering the models which output images. It is not specific to a single model. One example is image denoising network. The input to my network is a noisy image and its output is a denoised image. I want to minimize MSE between denoised image and ground truth image in VGG feature space. Whatever be the range of my network's output, I can easily change it to [0,255] by multiplying by appropriate factors. Similarly I can do any preprocessing required on my network's output (subtract mean, divide by std-dev).

Empirically I found that the output of preprocess function is in approx range [-128,151]. So VGG network is trained on images in this range. Now, if I feed it with images (or tensors from my network output) in the range [0,1], convolution would be fine but biases will cause problem right? To elaborate, for images in range [-128,151], a layer of VGG network may have learnt a bias of 5. When I feed an image in the range [-1,1] to the VGG network, the bias disrupts everything, right?

I'm not training VGG model. I'm using the weights from the model trained on ImageNet-1k database.

Upvotes: 5

Views: 3369

Answers (2)

Zabir Al Nazi Nabil
Zabir Al Nazi Nabil

Reputation: 11198

Short answer:

In the referenced question, OP uses VGG inside a loss function, the loss function gets the output from a model (unfortunately, OP didn't share his model), usually the output passes through an activation function like sigmoid, hence it was mentioned the output will be already normalized tensor. Whereas, preprocess_input assumes you have images with range 0-255., using this function inside the loss is complete a waste (you can't apply some normalization on tensors with a range 0-1 (after activation) that was supposed to be applied on images with range (0-255)). That's the basic reason why it was discouraged to use the function inside the loss (also you can't add any non-differentiable operations inside loss).

But what you can do is change the range to match that of the actual VGG preprocessing function. (making the tensors zero mean with range -1 to 1)

Are the features valid? - that completely depends on what OP is trying for the task that's being solved and the model. Even with all those details, it'll take some research and looking into the weights to decide.

Upvotes: -2

Dr. Snoopy
Dr. Snoopy

Reputation: 56357

In general you should not ignore or change the normalization of the data in which a model was trained. It could break the model in unexpected ways and since you are using the features in another learning model, it appears to work, but you have now hidden any changes in performance.

This is true specially for models that use saturating activations, for example with a ReLU you might get more zeros than with using normalized data.

Answer to your specific questions:

  1. Yes features would be in a different range for VGG and other networks, whether they are valid is another issue, there is a performance loss since normalization was not used.

  2. Changing the normalization scheme does not produce the same kind of normalization as the original, so it is not achieving the same. The code in the answer works but conceptually it is not doing the right thing.

Upvotes: 5

Related Questions