rpb
rpb

Reputation: 3299

How to normalize 2D array with sklearn?

Given a 2D array, I would like to normalize it into range 0-1.

I know this can be achieve as below

import numpy as np
from sklearn.preprocessing import normalize,MinMaxScaler

np.random.seed(0)
t_feat=4
t_epoch=3
t_wind=2

result = [np.random.rand(t_epoch, t_feat) for _ in range(t_wind)]
wdw_epoch_feat=np.array(result)
matrix=wdw_epoch_feat[:,:,0]

xmax, xmin = matrix.max(), matrix.min()
x_norm = (matrix - xmin)/(xmax - xmin)

which produce

[[0.55153917 0.42094786 0.98439526], [0.57160496 0.         1.        ]]

However, I cannot get the same result using the MinMaxScaler of sklearn

scaler = MinMaxScaler()
x_norm = scaler.fit_transform(matrix)

which produce

[[0. 1. 0.], [1. 0. 1.]]

Appreciate for any thought

Upvotes: 0

Views: 1937

Answers (2)

Akshay Sehgal
Akshay Sehgal

Reputation: 19312

A clever way to do this would be to reshape your data to 1D, apply transform and reshape it back to original -

import numpy as np

X = np.array([[-1, 2], [-0.5, 6]])
scaler = MinMaxScaler()
X_one_column = X.reshape([-1,1])
result_one_column = scaler.fit_transform(X_one_column)
result = result_one_column.reshape(X.shape)
print(result)
[[ 0.          0.42857143]
 [ 0.07142857  1.        ]]

Upvotes: 1

WolVes
WolVes

Reputation: 1336

You are standardizing the entire matrix. MinMaxScaler is designed for machine learning, thus performs standardization per row or column based on how you define it. To get the same results as you, you would need to turn the 2D array into a 1D array. I show this below and get your same results in the first column:

import numpy as np
from sklearn.preprocessing import normalize, MinMaxScaler

np.random.seed(0)
t_feat=4
t_epoch=3
t_wind=2

result = [np.random.rand(t_epoch, t_feat) for _ in range(t_wind)]
wdw_epoch_feat=np.array(result)
matrix=wdw_epoch_feat[:,:,0]

xmax, xmin = matrix.max(), matrix.min()
x_norm = (matrix - xmin)/(xmax - xmin)


matrix = np.array([matrix.flatten(), np.random.rand(len(matrix.flatten()))]).T
scaler = MinMaxScaler() 
test  = scaler.fit_transform(matrix)

print(test)
-------------------------------------------
[[0.55153917 0.        ]
 [0.42094786 0.63123194]
 [0.98439526 0.03034732]
 [0.57160496 1.        ]
 [0.         0.48835502]
 [1.         0.35865137]]

When you use MinMaxScaler for Machine Learning, you generally want to standardize each column.

Upvotes: 1

Related Questions