TensorFlow per_image_standardization vs mean standardization across full dataset

Question

I am curious about the difference between standardizing each image individually vs standardizing across the full data set.

I am using tensorflow/models/official/resnet which is built using tf.estimator. The tf estimator supports an input pipeline function that produces a tf Dataset. The Dataset object applies the tf.image.per_image_standardization op that standardizes by subtracting the mean of the image itself from each pixel and enforces unit variance.

This is different from other ML preprocessing that standardizes the image based on the mean across the whole dataset, such as with sklearn.preprocessing.StandardScaler.

I'm confused as to whether any aspect of this input pipeline is persisted in the tf SavedModel exported from the tf.estimator.Estimator.

So I'm wondering if I need to still apply feature standardization when serving the model, either via tf.contrib.predictor or when deploying the model in any other dnn format.

Should I be applying standardization across the dataset even though I'm using the per_image_standardization? If so, should I just export the mean value from the whole image set somehow so that when serving the model the server can just pick up the mean value from the whole dataset and apply standardization that way?

Ankish Bansal · Accepted Answer

In StandardScaler, we do feature-wise normalization. In case of images, we can do pixel-wise normalization, by considering the entire data-distribution, but that is not helpful, because of variability in the distribution. So per_image_standardization is preferred, which normalize the entire image to mean zero and std 1. It also make learning fast.

Further this-link can be helpful. There is another-link, where author has explained this by taking an example.

TensorFlow per_image_standardization vs mean standardization across full dataset

Answers (1)

Related Questions