OptimusPrime
OptimusPrime

Reputation: 619

Scaling an array (sklearn) - python

I been trying to figure this out for the past few hours but I have been having so many issues. I want to use the MinMaxScaler from sklearn.

The formula is like

Xnorm = X-Xmin / Xmax-Xmin

I want to apply that formula to some of the array positions but am having trouble figuring out how to also apply an inverse formula like

Xnorm = Xmax-X / Xmax-Xmin

My attempt: I want to do standardization for the 1st and 3rd value in an array and for the 2nd value in the array I want to get the inverse standardization from the formula above

X = np.array([[-0.23685953,  0.04296864,  0.94160742],  
              [-0.23685953,  1.05043547,  0.67673782],
              [0.12831355,  0.16017461,  0.27031023]])


from sklearn import preprocessing
minmax_scale = preprocessing.MinMaxScaler().fit(X)
X_std = minmax_scale.transform(X.iloc[:,np.r_[1,3])

Upvotes: 1

Views: 245

Answers (1)

Akshat
Akshat

Reputation: 136

Your task of calculating Xnorm for a particular column with formula :-

Xnorm = Xmax-X / Xmax-Xmin

can be solved if you inverse the sign of values in that particular column and then applying the basic standardization in that column.

Proof

If a column has max value as A and min value as B , after multiplying the all the values by -1 , the absolute value of new min element will become |A|

(so numerator will be calculated as { -1*X - -1*A } == { A - X } ) ,

and the relative difference in the denominator will remain same.


Implementing the logic on your test case :-

import numpy as np
X = np.array([[-0.23685953,  0.04296864,  0.94160742],  
              [-0.23685953,  1.05043547,  0.67673782],
              [0.12831355,  0.16017461,  0.27031023]])


from sklearn import preprocessing
X[:, 1] =  -1*X[:, 1]
minmax_scale = preprocessing.MinMaxScaler().fit(X)
X_std = minmax_scale.transform(X)

On printing X_std we get :-

array([[0.        , 1.        , 1.        ],
       [0.        , 0.        , 0.60543616],
       [1.        , 0.8836627 , 0.        ]])

This shows that column 2's values are the desired values i.e. the values calculated using the proposed inverse standardization formula.

Hope this will help.

Keep asking, keep growing :)

Upvotes: 1

Related Questions