hamslice
hamslice

Reputation: 149

Use lambda with pandas to calculate a new column conditional on existing column

I need to create a new column in a pandas DataFrame which is calculated as the ratio of 2 existing columns in the DataFrame. However, the denominator in the ratio calculation will change based on the value of a string which is found in another column in the DataFrame.

Example. Sample dataset :

import pandas as pd
df = pd.DataFrame(data={'hand'      : ['left','left','both','both'], 
                        'exp_force' : [25,28,82,84], 
                        'left_max'  : [38,38,38,38], 
                        'both_max'  : [90,90,90,90]})

I need to create a new DataFrame column df['ratio'] based on the condition of df['hand'].

If df['hand']=='left' then df['ratio'] = df['exp_force'] / df['left_max']

If df['hand']=='both' then df['ratio'] = df['exp_force'] / df['both_max']

Upvotes: 2

Views: 4185

Answers (3)

Pierre-Loic
Pierre-Loic

Reputation: 1564

You can use the apply() method of your dataframe :

df['ratio'] = df.apply(
    lambda x: x['exp_force'] / x['left_max'] if x['hand']=='left' else x['exp_force'] / x['both_max'],
    axis=1
)

Upvotes: 2

David Erickson
David Erickson

Reputation: 16683

You can use np.where():

import pandas as pd
df = pd.DataFrame(data={'hand'      : ['left','left','both','both'], 
                        'exp_force' : [25,28,82,84], 
                        'left_max'  : [38,38,38,38], 
                        'both_max'  : [90,90,90,90]})
df['ratio'] = np.where((df['hand']=='left'), df['exp_force'] / df['left_max'], df['exp_force'] / df['both_max'])
df

Out[42]: 
   hand  exp_force  left_max  both_max     ratio
0  left         25        38        90  0.657895
1  left         28        38        90  0.736842
2  both         82        38        90  0.911111
3  both         84        38        90  0.933333

Alternatively, in a real-life scenario, if you have lots of conditions and results, then you can use np.select(), so that you don't have to keep repeating your np.where() statement as I have done a lot in my older code. It's better to use np.select in these situations:

import pandas as pd
df = pd.DataFrame(data={'hand'      : ['left','left','both','both'], 
                        'exp_force' : [25,28,82,84], 
                        'left_max'  : [38,38,38,38], 
                        'both_max'  : [90,90,90,90]})
c1 = (df['hand']=='left')
c2 = (df['hand']=='both')
r1 = df['exp_force'] / df['left_max']
r2 = df['exp_force'] / df['both_max']
conditions = [c1,c2]
results = [r1,r2]
df['ratio'] = np.select(conditions,results)
df
Out[430]: 
   hand  exp_force  left_max  both_max     ratio
0  left         25        38        90  0.657895
1  left         28        38        90  0.736842
2  both         82        38        90  0.911111
3  both         84        38        90  0.933333

Upvotes: 3

AtanuCSE
AtanuCSE

Reputation: 8940

Enumerate

for i,e in enumerate(df['hand']):
 
  if e == 'left':
    df.at[i,'ratio'] = df.at[i,'exp_force'] / df.at[i,'left_max']
  if e == 'both':
    df.at[i,'ratio'] = df.at[i,'exp_force'] / df.at[i,'both_max']
df

Output:

    hand    exp_force   left_max    both_max    ratio
0   left    25            38          90      0.657895
1   left    28            38          90      0.736842
2   both    82            38          90      0.911111
3   both    84            38          90      0.933333

Upvotes: 1

Related Questions