DrTchocky
DrTchocky

Reputation: 535

Numpy array scaling not returning proper values

I have a numpy array that I want to alter by scaling all of the columns (e.g. all the values in a column are divided by the maximum value in that column so that all values are <1).

A sample output of the array is

[ 2. 0. 367.877 ..., -0.358 51.547 -32.633]

[ 2. 0. 339.824 ..., -0.33 52.562 -27.581]

[ 3. 0. 371.438 ..., -0.406 55.108 -35.573]

I've tried scaling the array (data_in) by the following code:

#normalize the data_in array 
data_in_normalized = data_in / data_in.max(axis=0)

However, the output of data_in_normalized is:

[ 0.5 0. 0.95437199 0.89363654 0.80751792 ]

[ 0.46931238 0.50660904 0.5003812 0.91250444 0.625 ]

[ 0.96229214 0.89483109 0.86989432 0.86491407 0.71287646 ]

[ -23.90909091 0.34346373 1.25110652 0. 0.8537859 1. 1.]

Clearly, it didn't normalize--there are multiple areas where the maximum value is >1. Is there a better way to scale the data, or am I using the max() function incorrectly (e.g. is the max() value being shared between columns?)

Upvotes: 0

Views: 100

Answers (1)

DSM
DSM

Reputation: 353019

IIUC, it's not that the maximum value is shared between columns, it's that you probably want to divide by the maximum absolute value instead, because you have elements of both signs. 1 > -100, after all, and so if you divide by the maximum value of a column with [1, -100], nothing would change.

For example:

>>> data_in = np.array([[-3,-2],[2,1]])
>>> data_in
array([[-3, -2],
       [ 2,  1]])
>>> data_in.max(axis=0)
array([2, 1])
>>> data_in / data_in.max(axis=0)
array([[-1.5, -2. ],
       [ 1. ,  1. ]])

but

>>> data_in / np.abs(data_in).max(axis=0)
array([[-1.        , -1.        ],
       [ 0.66666667,  0.5       ]])

Upvotes: 2

Related Questions