Reputation: 420
I have a huge dataframe and trying to figure out the most efficient way to normalize each value in a column and in turn go through all the columns using the mean and std.dev.
A sample of the dataframe is as follows:
TimeStamp 340 341 342 343
0 10:27:30 1.953036 2.110234 1.981548 1.705684
1 10:28:30 1.973408 2.046361 1.806923 1.496244
2 10:29:30 0.000000 0.000000 0.014881 0.198947
3 10:30:30 2.567976 3.169928 3.479591 3.557881
4 10:31:30 4415.498729 5075.996948 5653.925541 6133.202200
5 10:32:30 4473.930295 5146.802497 5736.030854 6224.380260
I want to: find a mean for col["340"]:
for column in df.iteritems():
df.mean()
df.std()
...further calculations for normalizing
However, I am extremely new to pandas and it is not working....:( I can find the mean per col but i have 2500 cols
Upvotes: 1
Views: 1892
Reputation: 9274
If your looking to normalize the data, then you can do this
(df.iloc[:,1:] - df.mean().values)/df.std().values
Assuming you want to do (X-mean)/standard Deviation normalization. Note: df.loc[]
used to exempt the first column for TimeStamp...
Upvotes: 1