muazfaiz
muazfaiz

Reputation: 5021

Calculating a new column in Python using pandas

I want apply simple operation on pandas dataframe to get percentage change for each row. I have data like this

colA    colB    colC    colD
39.5    41      41.5    40.5
15.5    17      17.5    16.5
21.5    23      23.5    22.5
40.5    42      42.5    41.5
9.5     11      11.5    10.5
26.5    28      28.5    27.5

My code

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')
print(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)

df['change'] = df.apply(lambda x: (((x.colA/ np.mean(x.iloc[:,2:], axis=1))-1)*100))

When I print the results it gives me exactly the thing I want, but when I do df.apply to create a column, it gives me the following error

Traceback (most recent call last):
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
  File "pandas\src\hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8543)
TypeError: an integer is required

Any suggestions ? Where I am doing it wrong ?

Upvotes: 1

Views: 692

Answers (1)

jezrael
jezrael

Reputation: 862521

Output of your function is Series with same index as df, so simple assign it to new column:

df['change'] = (((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)
print (df)
   colA  colB  colC  colD     change
0  39.5    41  41.5  40.5  -3.658537
1  15.5    17  17.5  16.5  -8.823529
2  21.5    23  23.5  22.5  -6.521739
3  40.5    42  42.5  41.5  -3.571429
4   9.5    11  11.5  10.5 -13.636364
5  26.5    28  28.5  27.5  -5.357143

Or use assign:

df = df.assign(change=(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100))
print (df)
   colA  colB  colC  colD     change
0  39.5    41  41.5  40.5  -3.658537
1  15.5    17  17.5  16.5  -8.823529
2  21.5    23  23.5  22.5  -6.521739
3  40.5    42  42.5  41.5  -3.571429
4   9.5    11  11.5  10.5 -13.636364
5  26.5    28  28.5  27.5  -5.357143

Also is possible use pandas functions only - div + iloc + mean + sub + mul:

df['change'] = df.colA.div(df.iloc[:,2:].mean(1)).sub(1).mul(100)
print (df)
   colA  colB  colC  colD     change
0  39.5    41  41.5  40.5  -3.658537
1  15.5    17  17.5  16.5  -8.823529
2  21.5    23  23.5  22.5  -6.521739
3  40.5    42  42.5  41.5  -3.571429
4   9.5    11  11.5  10.5 -13.636364
5  26.5    28  28.5  27.5  -5.357143

Upvotes: 2

Related Questions