Sherlock
Sherlock

Reputation: 161

Pytorch: Normalize Image data set

I want to normalize custom dataset of images. For that i need to compute mean and standard deviation by iterating over the dataset. How can I normalize my entire dataset before creating the data set?

Upvotes: 2

Views: 4075

Answers (2)

SalvadorViramontes
SalvadorViramontes

Reputation: 580

Well, let's take this image as an example:

Test image: Lena

The first thing you need to do is decide which library you want to use: Pillow or OpenCV. In this example I'll use Pillow:

from PIL import Image
import numpy as np

img = Image.open("test.jpg")
pix = np.asarray(img.convert("RGB")) # Open the image as RGB

Rchan = pix[:,:,0]  # Red color channel
Gchan = pix[:,:,1]  # Green color channel
Bchan = pix[:,:,2]  # Blue color channel

Rchan_mean = Rchan.mean()
Gchan_mean = Gchan.mean()
Bchan_mean = Bchan.mean()

Rchan_var = Rchan.var()
Gchan_var = Gchan.var()
Bchan_var = Bchan.var()

And the results are:

  • Red Channel Mean: 134.80585625
  • Red Channel Variance: 3211.35843945
  • Green Channel Mean: 81.0884125
  • Green Channel Variance: 1672.63200823
  • Blue Channel Mean: 68.1831375
  • Blue Channel Variance: 1166.20433566

Hope it helps for your needs.

Upvotes: 2

SalvadorViramontes
SalvadorViramontes

Reputation: 580

What normalization tries to do is mantain the overall information on your dataset, even when there exists differences in the values, in the case of images it tries to set apart some issues like brightness and contrast that in certain case does not contribute to the general information that the image has. There are several ways to do this, each one with pros and cons, depending on the image set you have and the processing effort you want to do on them, just to name a few:

  • Linear Histogram stetching: where you do a linear map on the current range of values in your image and stetch it to match the 0 and 255 values in RGB
  • Nonlinear Histogram stetching: Where you use a nonlinear function to map the input pixels to a new image. Commonly used functions are logarithms and exponentials. My favorite function is the cumulative probability function of the original histogram, it works pretty well.
  • Adaptive Histogram equalization: Where you do a linear histogram stretching in certain places of your image to avoid doing an identity mapping where you have the max range of values in your original image.

Upvotes: 2

Related Questions