Feature scaling using python StandardScaler produces negative values

Question

I am a newbie in Machine learning. I am trying to use feature scaling on my input training and test data using the python StandardScaler class. However, when I see the scaled values some of them are negative values even though the input values do not have negative values. Is this normal or am I missing anything in my code. Given below the relevant code being used for feature scaling.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
train = sc.fit_transform(train) //train contains training feature matrix
test = sc.transform(test)   //test contains test feature matrix

Pavel · Accepted Answer

From the docs:

Standardize features by removing the mean and scaling to unit variance

This means, given an input x, transform it to (x-mean)/std (where all dimensions and operations are well defined).

So even if your input values are all positive, removing the mean can make some of them negative:

>>> x = np.array([3,5,7])
>>> np.mean(x)
5.0
>>> x - np.mean(x)
array([-2.,  0.,  2.])

More details:

Feature scaling using python StandardScaler produces negative values

Answers (1)

Related Questions