sklearn StandardScaler returns all zeros

Question

I have a sklearn StandardScaler saved from a previous model and am trying to apply it to new data

scaler = myOldStandardScaler
print("ORIG:", X)
print("CLASS:", X.__class__)
X = scaler.fit_transform(X)
print("SCALED:", X)

I have three observations each with 2000 features. If I run each observation separately I get an output of all zeros.

ORIG: [[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: 
SCALED: [[ 0.  0.  0. ...,  0.  0.  0.]]

But if I append all three observations into one array, I get the results I want

ORIG: [[  0.00000000e+00   8.69737728e-08   7.53361877e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  9.49627142e-04   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: 
SCALED: [[-1.07174217  1.41421356  1.37153077 ...,  0.          0.          0.        ]
[ 1.33494964 -0.70710678 -0.98439142 ...,  0.          0.          0.        ]
[-0.26320747 -0.70710678 -0.38713935 ...,  0.          0.          0.        ]]

I've seen these two questions:

neither of which have an accepted answer.

I've tried:

reshaping from (1,n) to (n,1) (this gives incorrect results)
converting the array to np.float32 and np.float64 (still all zero)
creating an array of an array (again, all zero)
creating a np.matrix (again, all zeros)

What am I missing? The input to fit_transform is getting the same type, just a different size.

How do I get StandardScaler to work with a single observation?

Eduard Ilyasov · Accepted Answer

When you're trying to apply fit_transform method of StandardScaler object to array of size (1, n) you obviously get all zeros, because for each number of array you subtract from it mean of this number, which equal to number and divide to std of this number. If you want to get correct scaling of your array, you should convert it to array with size (n, 1). You can do it this way:

import numpy as np

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.fit_transform(X[:, np.newaxis])

In this case you get Standard scaling for one object by its features, that's not you're looking for.
If you want to get scaling by one feature of 3 objects, you should pass to fit_transform method array of size (3, 1) with values of certain feature corresponding to each object.

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.fit_transform(X[:, np.newaxis]) # you should get
# array([[-1.07174217], [1.33494964], [-0.26320747]]) you're looking for

And if you want to work with already fitted StandardScaler object, you shouldn't use fit_transform method, beacuse it refit object with new data. StandardScaler has transform method, which work with single observation:

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.transform(X.reshape(1, -1))

sklearn StandardScaler returns all zeros

Answers (2)

Related Questions