Reputation: 685
I'll create some test data:
import numpy as np
import pandas as pd
test_df = pd.DataFrame(np.random.randn(10,4), columns=['a','b','c','d'])
The function I'm trying to pass over the column variables is:
def standard(x):
return (x - x.mean()) / x.std()
What I then want to do is run the function standard
over each of the column variables using a for loop and add new columns to the data frame with the standardised data.
However, aside from making simple for loops I've not really done anything like this before. So any pointers would be much appreciated.
Upvotes: 0
Views: 720
Reputation: 7510
you can apply your function to every column like this:
std_df = test_df.apply(standard,axis = 1)]
You can concatinate this to your original frame like this:
big_df = pd.concat([ test_df, std_df], axis = 1)
To simplify you can run both of these together:
test_df = pd.concat([ test_df, test_df.apply(standard,axis = 1)], axis = 1)
Upvotes: 2
Reputation: 2500
In case you want to use one line, here does applying the function on original/source dataframe, then lambda function will do the trick(by passing an anonymous function as an argument to apply()):
test_df.apply(lambda x: (x - x.mean()) / x.std(), axis =1)
Out: a b c d 0 -0.381437 0.090135 -1.038400 1.329702 1 -0.722698 0.902806 0.817258 -0.997366 2 -0.521621 1.375428 -0.912505 0.058698 3 0.161679 0.900105 0.363529 -1.425313 4 -0.605061 0.527289 -1.045744 1.123515 5 -1.271155 0.481572 -0.253487 1.043071 6 -0.473747 -0.471403 -0.553750 1.498901 7 0.161580 1.335381 -0.935835 -0.561127 8 1.355149 -0.372754 0.029416 -1.011811 9 -1.065852 1.253947 0.276011 -0.464105
For the : "How to loop a function through columns in a dataframe and add to new columns" you can use pandas.DataFrame.join to join the source dataframe
test_df
with the new columnsstandardized form columns
that will have the new name :
test_df.join(test_df.apply(lambda x: (x - x.mean()) / x.std(), axis =1), rsuffix='_std')
a b c d a_std b_std c_std d_std 0 -0.142156 0.239129 -0.673335 1.241366 -0.381437 0.090135 -1.038400 1.329702 1 -0.726785 0.668620 0.595182 -0.962573 -0.722698 0.902806 0.817258 -0.997366 2 -1.212991 0.384315 -1.542113 -0.724365 -0.521621 1.375428 -0.912505 0.058698 3 0.366151 0.897416 0.511373 -0.775619 0.161679 0.900105 0.363529 -1.425313 4 -0.248543 0.830104 -0.668326 1.398053 -0.605061 0.527289 -1.045744 1.123515 5 -1.001275 0.695089 -0.016333 1.238530 -1.271155 0.481572 -0.253487 1.043071 6 -0.850323 -0.848182 -0.923375 0.950963 -0.473747 -0.471403 -0.553750 1.498901 7 0.337371 0.688291 0.009287 0.121310 0.161580 1.335381 -0.935835 -0.561127 8 1.976673 -0.953941 -0.271841 -2.037817 1.355149 -0.372754 0.029416 -1.011811 9 -1.795365 0.058748 -0.722873 -1.314415 -1.065852 1.253947 0.276011 -0.464105
Upvotes: 1
Reputation: 4051
you can do it like this:
for column in test_df:
print(standard(test_df[column]))
Upvotes: 1