Reputation: 5021
I want apply simple operation on pandas dataframe to get percentage change for each row. I have data like this
colA colB colC colD
39.5 41 41.5 40.5
15.5 17 17.5 16.5
21.5 23 23.5 22.5
40.5 42 42.5 41.5
9.5 11 11.5 10.5
26.5 28 28.5 27.5
My code
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv')
print(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)
df['change'] = df.apply(lambda x: (((x.colA/ np.mean(x.iloc[:,2:], axis=1))-1)*100))
When I print the results it gives me exactly the thing I want, but when I do df.apply to create a column, it gives me the following error
Traceback (most recent call last):
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8543)
TypeError: an integer is required
Any suggestions ? Where I am doing it wrong ?
Upvotes: 1
Views: 692
Reputation: 862521
Output of your function is Series
with same index as df
, so simple assign it to new column:
df['change'] = (((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)
print (df)
colA colB colC colD change
0 39.5 41 41.5 40.5 -3.658537
1 15.5 17 17.5 16.5 -8.823529
2 21.5 23 23.5 22.5 -6.521739
3 40.5 42 42.5 41.5 -3.571429
4 9.5 11 11.5 10.5 -13.636364
5 26.5 28 28.5 27.5 -5.357143
Or use assign
:
df = df.assign(change=(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100))
print (df)
colA colB colC colD change
0 39.5 41 41.5 40.5 -3.658537
1 15.5 17 17.5 16.5 -8.823529
2 21.5 23 23.5 22.5 -6.521739
3 40.5 42 42.5 41.5 -3.571429
4 9.5 11 11.5 10.5 -13.636364
5 26.5 28 28.5 27.5 -5.357143
Also is possible use pandas functions only - div
+ iloc
+ mean
+ sub
+
mul
:
df['change'] = df.colA.div(df.iloc[:,2:].mean(1)).sub(1).mul(100)
print (df)
colA colB colC colD change
0 39.5 41 41.5 40.5 -3.658537
1 15.5 17 17.5 16.5 -8.823529
2 21.5 23 23.5 22.5 -6.521739
3 40.5 42 42.5 41.5 -3.571429
4 9.5 11 11.5 10.5 -13.636364
5 26.5 28 28.5 27.5 -5.357143
Upvotes: 2