Reputation: 1074
I am using Keras (version 2.0.0) and I'd like to make use of pretrained models like e.g. VGG16. In order to get started, I ran the example of the [Keras documentation site ][https://keras.io/applications/] for extracting features with VGG16:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
The used preprocess_input()
function bothers me
(the function does Zero-centering by mean pixel what can be seen by looking at the source code).
Do I really have to preprocess input data (validation/test data) before using a trained model?
a) If yes, one can conclude that you always have to be aware of what preprocessing steps have been performed during training phase?!
b) If no: Does preprocessing of validation/test data cause a bias?
I appreciate your help.
Upvotes: 3
Views: 1444
Reputation: 628
Yes you should use the preprocessing step. You can retrain the model without it but the first layers will learn to center your datas so this is a waste of parameters.
If you do not recenter your performances will suffer.
Great thread on reddit : https://www.reddit.com/r/MachineLearning/comments/3q7pjc/why_is_removing_the_mean_pixel_value_from_each/
Upvotes: 1