Reputation: 309
I would like to define custom functions with signatures that includes one or several columns of the dataframe.
I tried to replicate the solution mentioned in article "How do I use Pandas 'apply' function to multiple columns?", but I fail to understand the way a function needs to be set up to accept data from other columns as input.
A sample of my code:
import pandas as pd
df=pd.DataFrame({'NAME':['A','B','C','D'],'HOURS':[38, 52, 1040, 28],'ROLE':['Manager','Expert','Expert','Expert']})
def apply_rate(col1='HOURS', col2='ROLE'):
if row[col2]=='Manager': return row[col1]*165
else: return row[col1]*135
df['TOTAL']=df.apply(lambda row: apply_rate(row['HOURS'],row['ROLE']),axis=1)
I get a message "KeyError: ('Manager', 'occurred at index 0')", but I am stuck at this stage, and I don't know how to get out of this blocking point.
Upvotes: 1
Views: 147
Reputation: 164673
The trick is to remove the lambda
altogether. Feed your function to pd.DataFrame.apply
and, possibly, feed your additional function parameters to apply
directly:
def apply_rate(row, col1, col2):
if row[col2]=='Manager': return row[col1]*165
else: return row[col1]*135
df['TOTAL'] = df.apply(apply_rate, axis=1, col1='HOURS', col2='ROLE')
print(df)
NAME HOURS ROLE TOTAL
0 A 38 Manager 6270
1 B 52 Expert 7020
2 C 1040 Expert 140400
3 D 28 Expert 3780
However, row-wise operations are inefficient and not a recommended use of Pandas. You can easily vectorise your algorithm with column-wise operations:
df['TOTAL'] = df['HOURS'] * np.where(df['ROLE'] == 'Manager', 165, 135)
An alternative, more easily extendable version can make use of a dictionary mapping:
factor_map = {'Manager': 165}
df['TOTAL'] = df['HOURS'] * df['ROLE'].map(factor_map).fillna(135)
Upvotes: 3