tom91
tom91

Reputation: 685

How to loop a function through columns in a data frame and add to new columns

I'll create some test data:

import numpy as np
import pandas as pd
test_df = pd.DataFrame(np.random.randn(10,4), columns=['a','b','c','d'])

The function I'm trying to pass over the column variables is:

def standard(x):
    return (x - x.mean()) / x.std()

What I then want to do is run the function standard over each of the column variables using a for loop and add new columns to the data frame with the standardised data.

However, aside from making simple for loops I've not really done anything like this before. So any pointers would be much appreciated.

Upvotes: 0

Views: 720

Answers (3)

Christian Sloper
Christian Sloper

Reputation: 7510

you can apply your function to every column like this:

std_df = test_df.apply(standard,axis = 1)]

You can concatinate this to your original frame like this:

big_df = pd.concat([ test_df, std_df], axis = 1)

To simplify you can run both of these together:

test_df = pd.concat([ test_df, test_df.apply(standard,axis = 1)], axis = 1)

Upvotes: 2

n1tk
n1tk

Reputation: 2500

In case you want to use one line, here does applying the function on original/source dataframe, then lambda function will do the trick(by passing an anonymous function as an argument to apply()):

test_df.apply(lambda x: (x - x.mean()) / x.std(), axis =1)
Out:

           a         b            c          d
0     -0.381437   0.090135    -1.038400   1.329702
1     -0.722698   0.902806    0.817258    -0.997366
2     -0.521621   1.375428    -0.912505   0.058698
3     0.161679    0.900105    0.363529    -1.425313
4     -0.605061   0.527289    -1.045744   1.123515
5     -1.271155   0.481572    -0.253487   1.043071
6     -0.473747   -0.471403   -0.553750   1.498901
7     0.161580    1.335381    -0.935835   -0.561127
8     1.355149    -0.372754   0.029416    -1.011811
9     -1.065852   1.253947    0.276011    -0.464105

For the : "How to loop a function through columns in a dataframe and add to new columns" you can use pandas.DataFrame.join to join the source dataframe test_dfwith the new columns standardized form columns that will have the new name :

test_df.join(test_df.apply(lambda x: (x - x.mean()) / x.std(), axis =1), rsuffix='_std')
          a          b            c          d           a_std       b_std        c_std      d_std
0     -0.142156   0.239129    -0.673335   1.241366    -0.381437   0.090135    -1.038400   1.329702
1     -0.726785   0.668620    0.595182    -0.962573   -0.722698   0.902806    0.817258    -0.997366
2     -1.212991   0.384315    -1.542113   -0.724365   -0.521621   1.375428    -0.912505   0.058698
3     0.366151    0.897416    0.511373    -0.775619   0.161679    0.900105    0.363529    -1.425313
4     -0.248543   0.830104    -0.668326   1.398053    -0.605061   0.527289    -1.045744   1.123515
5     -1.001275   0.695089    -0.016333   1.238530    -1.271155   0.481572    -0.253487   1.043071
6     -0.850323   -0.848182   -0.923375   0.950963    -0.473747   -0.471403   -0.553750   1.498901
7     0.337371    0.688291    0.009287    0.121310    0.161580    1.335381    -0.935835   -0.561127
8     1.976673    -0.953941   -0.271841   -2.037817   1.355149    -0.372754   0.029416    -1.011811
9     -1.795365   0.058748    -0.722873   -1.314415   -1.065852   1.253947    0.276011    -0.464105

pandas.Series.apply

pandas.DataFrame.apply

Upvotes: 1

ChaosPredictor
ChaosPredictor

Reputation: 4051

you can do it like this:

for column in test_df:
    print(standard(test_df[column]))

Upvotes: 1

Related Questions