Ivan Bilan
Ivan Bilan

Reputation: 2439

StandardScaler in sklearn not fitting properly, or is it?

I am using StandardScalar from sklearn to scale my feature vector, but it doesn't seem to fit the training feature vector properly. Or maybe it is expected behavior, but if it is, could someone explain why (preferably with some mathematical explanation too).

from sklearn.preprocessing import StandardScaler
import numpy as np

scale_inst = StandardScaler()

# train feature vector
x1 = np.array([1, 2, 10, 44, 55])
# test feature vector
x2 = np.array([1, 2, 10, 44, 667])

# first I fit
scale_inst.fit(x1)
# than I transform training vector and test vector
print scale_inst.transform(x1)
print scale_inst.transform(x2)

# OUTPUT
[-0.94627295 -0.90205459 -0.54830769  0.95511663  1.44151861]
[ -0.94627295  -0.90205459  -0.54830769   0.95511663  28.50315638]

Why does it scale 667 to 28.50315638, shouldn't it be scaled to 1.44151861, aka the max value of the training feature vector?

Upvotes: 0

Views: 3621

Answers (2)

mtzl
mtzl

Reputation: 404

From the StandardScaler API:

Standardize features by removing the mean and scaling to unit variance

It is trained on x1, so it uses the variance/mean of x1 in both cases. So what this does is simply:

>>> (x1 - np.mean(x1)) / np.std(x1)
array([-0.94627295, -0.90205459, -0.54830769,  0.95511663,  1.44151861])

>>> (x2 - np.mean(x1)) / np.std(x1)
array([ -0.94627295,  -0.90205459,  -0.54830769,   0.95511663, 28.50315638])

You are probably looking for what Sagar proposed.

Upvotes: 3

Sagar Waghmode
Sagar Waghmode

Reputation: 777

It is behaving correctly, for your used-case, you can use MinMaxScaler or MaxAbsScaler which kind of fits both training and test data in [0, 1] or [-1, 1] respectively.

Upvotes: 2

Related Questions