Why does the tensorflow lite example use image_mean and image_std when adding pixel values to the array?

Question

Looking at https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/android/app/src/main/java/org/tensorflow/lite/examples/classification/tflite/ClassifierFloatMobileNet.java,

Can you help me understand why they - IMAGE_MEAN and / IMAGE_STD?

  private static final float IMAGE_MEAN = 127.5f;
  private static final float IMAGE_STD = 127.5f;

  //...

@Override
  protected void addPixelValue(int pixelValue) {
    imgData.putFloat((((pixelValue >> 16) & 0xFF) - IMAGE_MEAN) / IMAGE_STD);
    imgData.putFloat((((pixelValue >> 8) & 0xFF) - IMAGE_MEAN) / IMAGE_STD);
    imgData.putFloat(((pixelValue & 0xFF) - IMAGE_MEAN) / IMAGE_STD);
  }

You'll notice it's not necessary for the Quantized example (see https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/android/app/src/main/java/org/tensorflow/lite/examples/classification/tflite/ClassifierQuantizedMobileNet.java).

@Override
  protected void addPixelValue(int pixelValue) {
    imgData.put((byte) ((pixelValue >> 16) & 0xFF));
    imgData.put((byte) ((pixelValue >> 8) & 0xFF));
    imgData.put((byte) (pixelValue & 0xFF));
  }

Rough thoughts so far....

127.5 = 255 / 2. Pixels are frequently represented as colors using a range from 0-255. This is exactly the middle of that range. So every pixel color is being adjusted to be between -1 and 1... but why?

miaout17 · Accepted Answer

127.5 = 255 / 2. Pixels are frequently represented as colors using a range from 0-255. This is exactly the middle of that range. So every pixel color is being adjusted to be between -1 and 1...

This is exactly correct.

but why?

Input normalization is a common technique in machine learning. This specific model was trained with input value range -1 to 1, so we should normalize the inference input to the same range to achieve best result.

To give some intuition, what will go wrong if the input isn't normalized to -1 to 1:

For example, if we accidentally set IMAGE_MEAN=0.0f & IMAGE_STD = 255.0f, it will normalize the input to 0 to 1. The model will still "see" the image but everything become brighter. The accuracy may drop a bit
If we don't normalize but simply converting uint8 to float, the value range is 0~255 when expecting -1~1. The model may "see" a super bright / white image. The accuracy may significantly drop or doesn't work at all.

The range can be arbitrary. -1~1 and 0~1 are often used. The point is the same normalization should be applied to both training & inference.

Why does the tensorflow lite example use image_mean and image_std when adding pixel values to the array?

Answers (1)

Related Questions