user11580242
user11580242

Reputation: 63

Loop through columns in Pandas dataframe

I have a pandas dataframe and would like to loop through all the columns and do some math function. But, unable to get the desired result.Below is my sample dataframe with 3 columns.

mydf=pd.DataFrame({'ID1':[9,3,7,5], 'ID2':[15,10,3,8],'ID3':[20,14,10,2]})

mydf

  ID1   ID2 ID3
0   9   15  20
1   3   10  14
2   7   3   10
3   5   8   2

the below is what I need to do for all the columns and it works perfectly. However, this is just sample toy dataset and I have over 500 columns in my actual dataframe and am trying to do looping through all the columns but it's not giving the desired result.

tmp_df=mydf.copy()

tmp_df['ID1']=np.log(mydf.iloc[:,0]).diff(1)
tmp_df['ID2']=np.log(mydf.iloc[:,1]).diff(1)
tmp_df['ID3']=np.log(mydf.iloc[:,2]).diff(1)
tmp_df

    ID1          ID2             ID3
0   NaN          NaN             NaN
1   -1.098612   -0.405465   -0.356675
2   0.847298    -1.203973   -0.336472
3   -0.336472   0.980829    -1.609438

Basically, I need the above result using a loop as

I have 500 columns to do

I've tried like this below:

for (i,j) in tmp_df.iteritems():
    #tmp_df['j']=np.log(mydf.iloc[:,0]).diff(1)
    j=np.log(mydf.iloc[:,0]).diff(1)
    print('Column:',i)
    print('Values:',j.values)

but, this loop gives values in list and also not iterating for all columns as I wanted. I think this can be done pretty easily but, am unable to get it work. Appreciate if anyone can help me with the efficient way of doing for all 500 columns

Expected Result using any Looping logic

ID1           ID2             ID3
0   NaN           NaN             NaN
1   -1.098612   -0.405465   -0.356675
2   0.847298    -1.203973   -0.336472
3   -0.336472   0.980829    -1.609438

Upvotes: 1

Views: 1520

Answers (1)

MichaelD
MichaelD

Reputation: 1326

A way to do this is to use apply, no need to iterate rows

In [48]: mydf=pd.DataFrame({'ID1':[9,3,7,5], 'ID2':[15,10,3,8],'ID3':[20,14,10,2]})

In [49]: mydf.apply(lambda x: np.log(x).diff(1), axis='rows')
Out[49]:
        ID1       ID2       ID3
0       NaN       NaN       NaN
1 -1.098612 -0.405465 -0.356675
2  0.847298 -1.203973 -0.336472
3 -0.336472  0.980829 -1.609438

The result is a dataframe, so if you need to keep the result in a new dataframe, just set it as usual

In [50]: new_mydf = mydf.apply(lambda x: np.log(x).diff(1), axis='rows')

In [51]: print(new_mydf)
        ID1       ID2       ID3
0       NaN       NaN       NaN
1 -1.098612 -0.405465 -0.356675
2  0.847298 -1.203973 -0.336472
3 -0.336472  0.980829 -1.609438

EDIT: Adding more details to rename columns after the apply to answer OP comment

In [58]: new_mydf = mydf.apply(lambda x: np.log(x).diff(1), axis='rows').rename(lambda c_name: f'new_{c_name}', axis='columns')

In [58]: print(new_mydf)
    new_ID1   new_ID2   new_ID3
0       NaN       NaN       NaN
1 -1.098612 -0.405465 -0.356675
2  0.847298 -1.203973 -0.336472
3 -0.336472  0.980829 -1.609438

Upvotes: 1

Related Questions