Reputation: 63
I have a pandas dataframe and would like to loop through all the columns and do some math function. But, unable to get the desired result.Below is my sample dataframe with 3 columns.
mydf=pd.DataFrame({'ID1':[9,3,7,5], 'ID2':[15,10,3,8],'ID3':[20,14,10,2]})
mydf
ID1 ID2 ID3
0 9 15 20
1 3 10 14
2 7 3 10
3 5 8 2
the below is what I need to do for all the columns and it works perfectly. However, this is just sample toy dataset and I have over 500 columns in my actual dataframe and am trying to do looping through all the columns but it's not giving the desired result.
tmp_df=mydf.copy()
tmp_df['ID1']=np.log(mydf.iloc[:,0]).diff(1)
tmp_df['ID2']=np.log(mydf.iloc[:,1]).diff(1)
tmp_df['ID3']=np.log(mydf.iloc[:,2]).diff(1)
tmp_df
ID1 ID2 ID3
0 NaN NaN NaN
1 -1.098612 -0.405465 -0.356675
2 0.847298 -1.203973 -0.336472
3 -0.336472 0.980829 -1.609438
Basically, I need the above result using a loop as
I have 500 columns to do
I've tried like this below:
for (i,j) in tmp_df.iteritems():
#tmp_df['j']=np.log(mydf.iloc[:,0]).diff(1)
j=np.log(mydf.iloc[:,0]).diff(1)
print('Column:',i)
print('Values:',j.values)
but, this loop gives values in list and also not iterating for all columns as I wanted. I think this can be done pretty easily but, am unable to get it work. Appreciate if anyone can help me with the efficient way of doing for all 500 columns
Expected Result using any Looping logic
ID1 ID2 ID3
0 NaN NaN NaN
1 -1.098612 -0.405465 -0.356675
2 0.847298 -1.203973 -0.336472
3 -0.336472 0.980829 -1.609438
Upvotes: 1
Views: 1520
Reputation: 1326
A way to do this is to use apply
, no need to iterate rows
In [48]: mydf=pd.DataFrame({'ID1':[9,3,7,5], 'ID2':[15,10,3,8],'ID3':[20,14,10,2]})
In [49]: mydf.apply(lambda x: np.log(x).diff(1), axis='rows')
Out[49]:
ID1 ID2 ID3
0 NaN NaN NaN
1 -1.098612 -0.405465 -0.356675
2 0.847298 -1.203973 -0.336472
3 -0.336472 0.980829 -1.609438
The result is a dataframe, so if you need to keep the result in a new dataframe, just set it as usual
In [50]: new_mydf = mydf.apply(lambda x: np.log(x).diff(1), axis='rows')
In [51]: print(new_mydf)
ID1 ID2 ID3
0 NaN NaN NaN
1 -1.098612 -0.405465 -0.356675
2 0.847298 -1.203973 -0.336472
3 -0.336472 0.980829 -1.609438
EDIT: Adding more details to rename columns after the apply to answer OP comment
In [58]: new_mydf = mydf.apply(lambda x: np.log(x).diff(1), axis='rows').rename(lambda c_name: f'new_{c_name}', axis='columns')
In [58]: print(new_mydf)
new_ID1 new_ID2 new_ID3
0 NaN NaN NaN
1 -1.098612 -0.405465 -0.356675
2 0.847298 -1.203973 -0.336472
3 -0.336472 0.980829 -1.609438
Upvotes: 1