Reputation: 583
from sklearn import MinMaxScaler, StandardScaler
import numpy as np
a = ([1,2,3],[4,5,6])
stan = StandardScaler()
mima = MinMaxScaler()
stan.fit_tranform(a)
mima.fit_transform(a)
results after runnin stan and mima
array([[-1., -1., -1.],
[ 1., 1., 1.]])
array([[0., 0., 0.],
[1., 1., 1.]])
However, when I tried to pass a 1-D array like this,
b = np.random.random(10)
stan.fit_tranform(b)
mima.fit_transform(b)
I got an error like this
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 517, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 308, in fit
return self.partial_fit(X, y)
File "/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 334, in partial_fit
estimator=self, dtype=FLOAT_DTYPES)
File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[0.05808361 0.86617615 0.60111501 0.70807258 0.02058449 0.96990985
0.83244264 0.21233911 0.18182497 0.18340451].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
There was a thread on GitHub too, but it was closed a long time ago. Is there any way to solve this.
Upvotes: 3
Views: 5155
Reputation: 36599
First try to understand what the MinMaxScalar and StandardScalar are doing. They standardize (or scale) the values of data based on individual columns. So if your data has 3 columns:-
1) MinMaxScalar will individually find the maximum and minimum from each column and scale other values of that column according to those minimum and maximum. Same for all columns. 2) StandardScalar will similarly find the mean and std of each column separately and then do the scaling.
Then, see my answer here for the explanation of why it does not accept 1-d array.
Now, you are passing a 1-d array in those scalars. How would they know what to scale. How many columns are there? Do you want all the 10 values to be a single column, or do you want to treat all the 10 values as 10 columns which will be handled separately from each other. In either case, its you who have to reshape the data accordingly and scikit will not handle that.
1) If you want them to be a single column, reshape like this:
# Here first value -1 is for rows and second 1 for column
# This means you want the columns to be 1 and -1
# will be configured automatically (10 in this case)
b = b.reshape(-1, 1)
2) If you want these 10 values to be single row with 10 columns, do this:
b = b.reshape(1, -1)
Then you can do this:
stan.fit_tranform(b)
But observe that the results will be different in each case.
Upvotes: 7