Brain_overflowed
Brain_overflowed

Reputation: 420

Normalizing values in each column of a pandas dataframe

I have a huge dataframe and trying to figure out the most efficient way to normalize each value in a column and in turn go through all the columns using the mean and std.dev.

A sample of the dataframe is as follows:

    TimeStamp          340          341          342          343       
0    10:27:30     1.953036     2.110234     1.981548     1.705684  
1    10:28:30     1.973408     2.046361     1.806923     1.496244   
2    10:29:30     0.000000     0.000000     0.014881     0.198947   
3    10:30:30     2.567976     3.169928     3.479591     3.557881   
4    10:31:30  4415.498729  5075.996948  5653.925541  6133.202200   
5    10:32:30  4473.930295  5146.802497  5736.030854  6224.380260

I want to: find a mean for col["340"]:

    for column in df.iteritems():
df.mean()
df.std()

...further calculations for normalizing

However, I am extremely new to pandas and it is not working....:( I can find the mean per col but i have 2500 cols

Upvotes: 1

Views: 1892

Answers (1)

DJK
DJK

Reputation: 9274

If your looking to normalize the data, then you can do this

(df.iloc[:,1:] - df.mean().values)/df.std().values

Assuming you want to do (X-mean)/standard Deviation normalization. Note: df.loc[] used to exempt the first column for TimeStamp...

Upvotes: 1

Related Questions