Sandy.Arv
Sandy.Arv

Reputation: 605

Normalization of data

I have a code to normalise data imported from xls. which is a s follows

import numpy as np
Xt, Tt = XLSImport('AI_sample.xlsx')

# calculate the maximum values
valX1_max = np.max((Xt)[0])
valX2_max = np.max((Xt)[1])
valX3_max = np.max((Xt)[2])
valX4_max = np.max((Xt)[3])
valX5_max = np.max((Xt)[4])
valX6_max = np.max((Xt)[5])
valX7_max = np.max((Xt)[6]) 
valX8_max = np.max((Xt)[7])

valT1_max = np.max((Tt)[0])
valT2_max = np.max((Tt)[1])

print valX1_max, valX2_max, valX3_max, valX4_max, valX5_max, valX6_max, valX7_max, valX8_max, valT1_max, valT2_max


# normalize data
Xt[0] /= valX1_max
Xt[1] /= valX2_max
Xt[2] /= valX3_max
Xt[3] /= valX4_max
Xt[4] /= valX5_max
Xt[5] /= valX6_max
Xt[6] /= valX7_max
Xt[7] /= valX8_max
Tt[0] /= valT1_max
Tt[1] /= valT2_max

print Xt, Tt

This is a rather simple code where, Xt and Tt are sets of data. The dimensions of Xt is (750, 8) where 750 is number of rows and 8 is the number of columns and dimensions of Tt is (750, 2) the numbers corresponding to rows and columns as above. The data is being normalised for each column based on the maximum value in that particular column.

Now I want to create a function and create a loop so that I dont want to repeat the same code over and over again as done in my example. How do i do that? i am new to programming and i am not that familiar with looping concepts. thank u in advance

I want to have something like:

func norm(param):
     val_max = []
     for i in range(num_rows):
           #and the normalization inside this block

how should i do this?

Upvotes: 2

Views: 2945

Answers (2)

Leb
Leb

Reputation: 15953

Import your data into numpy array. What you'll be able to do then is obtain the max based on each column then divide the whole array by that max.

i.e.:

import numpy as np

arr = np.random.randint(0, 100, (10,5)) # replace this line with [np.array][1] 
                                        # to load your data from excel

print(arr)

[[41 71 95 62 26]
 [85 37  5 71 74]
 [14 75 93 70 66]
 [86 79 93  7 39]
 [ 4 84 97 92 24]
 [54 28 49 62 36]
 [37 63 84 45 88]
 [48 92 48 93 94]
 [47 74 22 58 94]
 [34 92 86 30 85]]

print(np.max(arr, axis=0))

[86 92 97 93 94]

print(arr/np.max(arr, axis=0))

[[ 0.47674419  0.77173913  0.97938144  0.66666667  0.27659574]
 [ 0.98837209  0.40217391  0.05154639  0.76344086  0.78723404]
 [ 0.1627907   0.81521739  0.95876289  0.75268817  0.70212766]
 [ 1.          0.85869565  0.95876289  0.07526882  0.41489362]
 [ 0.04651163  0.91304348  1.          0.98924731  0.25531915]
 [ 0.62790698  0.30434783  0.50515464  0.66666667  0.38297872]
 [ 0.43023256  0.68478261  0.86597938  0.48387097  0.93617021]
 [ 0.55813953  1.          0.49484536  1.          1.        ]
 [ 0.54651163  0.80434783  0.22680412  0.62365591  1.        ]
 [ 0.39534884  1.          0.88659794  0.32258065  0.90425532]]

The print is just for visualization, you'll only need:

import numpy as np

arr = np.random.randint(0, 100, (10,5))

norm = arr/np.max(arr, axis=0)

Upvotes: 1

Ward
Ward

Reputation: 2852

I would suggest to use one of the many excellent data processing libraries available in python. Pandas seems especially easy to use. Most of the things you will need are probably already programmed in there!

You could consider miniconda. It is a python distribution that makes it very easy to install complex dependencies, such as numpy (where pandas depends upon)

Once you have python + pandas running, doing the normalisation is as easy as pie. See this answer for a good explanation!!

Good luck, and welcome to the exiting world of python programming :)

edit: After rereading your question, I think now that you have numpy running. So installing pandas is even easier, using pip.

Upvotes: 1

Related Questions