Scaling the target variable is giving error in Python using StandardScaler of Sklearn library

Question

Scaling the target variable by normal procedure of using StandardScaler class is giving error. However, the error got resolved by adding a line y = y.reshape(-1,1). After which applying the fit_transform method on target variable gave the standardized value. I am not able to figure out how adding y.reshape(-1,1) made it work?

X is independent variable having one feature and y is the numerical target variable 'Salary'. I was trying to apply Support Vector Regression to the problem, which needs explicit feature scaling. I tried the following code:

from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
sc_y = StandardScaler()

X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

It gave me error like:

ValueError: Expected 2D array, got 1D array instead Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

After I made the following changes:

X = sc_X.fit_transform(X)
y = y.reshape(-1,1)
y = sc_y.fit_transform(y)

The standardization worked just fine. I need to understand how adding this y = y.reshape(-1,1) helped achieve it. Thanks.

Itamar Mushkin · Accepted Answer

This comes up a lot in SKLearn.
From the docs of the scaler's .transform function, the input to .transform has to be a 2D matrix where the second dimension is the number of features:

Perform standardization by centering and scaling

Parameters: X : array-like, shape [n_samples, n_features] The data used to scale along the features axis.

Now, the last dimension has to be explicitly set to 1, not missing. Before the data is reshaped (i.e. y=y.reshape(-1,1)), the last dimension is missing - see this example:

import numpy as np
a = np.array([0,0,0])
print(a) # [0 0 0]
print(a.shape) # (3,)
b = a.reshape(-1,1)
print(b) # [[0] [0] [0]]
print(b.shape) # (3,1)

The reshape method changes the shape of an array: for example, if a is an array with 6 elements (and whatever shape), a.reshape(3,2) changes its shape to 3-by-2.
The -1 argument basically means "use the dimension that is needed here so that the data fits".
So, a.reshape(-1,1) an array with n elements to an n-by-1 2d array (without explicitly specifying n).

Scaling the target variable is giving error in Python using StandardScaler of Sklearn library

Answers (2)

Related Questions