Sal
Sal

Reputation: 1693

sklearn StandardScaler returns all zeros

I have a sklearn StandardScaler saved from a previous model and am trying to apply it to new data

scaler = myOldStandardScaler
print("ORIG:", X)
print("CLASS:", X.__class__)
X = scaler.fit_transform(X)
print("SCALED:", X)

I have three observations each with 2000 features. If I run each observation separately I get an output of all zeros.

ORIG: [[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[ 0.  0.  0. ...,  0.  0.  0.]]

But if I append all three observations into one array, I get the results I want

ORIG: [[  0.00000000e+00   8.69737728e-08   7.53361877e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  9.49627142e-04   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[-1.07174217  1.41421356  1.37153077 ...,  0.          0.          0.        ]
[ 1.33494964 -0.70710678 -0.98439142 ...,  0.          0.          0.        ]
[-0.26320747 -0.70710678 -0.38713935 ...,  0.          0.          0.        ]]

I've seen these two questions:

neither of which have an accepted answer.

I've tried:

What am I missing? The input to fit_transform is getting the same type, just a different size.

How do I get StandardScaler to work with a single observation?

Upvotes: 14

Views: 10802

Answers (2)

DRFeinberg
DRFeinberg

Reputation: 77

I had the same problem. Another (simpler) solution to the problem of array with size (1, n) is to transpose the matrix and it will be size (n, 1).

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.transform(X.T)

Upvotes: 2

Eduard Ilyasov
Eduard Ilyasov

Reputation: 3308

When you're trying to apply fit_transform method of StandardScaler object to array of size (1, n) you obviously get all zeros, because for each number of array you subtract from it mean of this number, which equal to number and divide to std of this number. If you want to get correct scaling of your array, you should convert it to array with size (n, 1). You can do it this way:

import numpy as np

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.fit_transform(X[:, np.newaxis])

In this case you get Standard scaling for one object by its features, that's not you're looking for.
If you want to get scaling by one feature of 3 objects, you should pass to fit_transform method array of size (3, 1) with values of certain feature corresponding to each object.

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.fit_transform(X[:, np.newaxis]) # you should get
# array([[-1.07174217], [1.33494964], [-0.26320747]]) you're looking for

And if you want to work with already fitted StandardScaler object, you shouldn't use fit_transform method, beacuse it refit object with new data. StandardScaler has transform method, which work with single observation:

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.transform(X.reshape(1, -1))

Upvotes: 28

Related Questions