Eduardo Espinola
Eduardo Espinola

Reputation: 35

Applying lambda row on df with if statement

I have two data frames, trying to use entries from df1 to limit amounts in df2, then add them up. It seems like my code is limiting right, but not adding the amounts up.

Code:

import pandas as pd

df1 = pd.DataFrame({'Caps':['25','45','65']})

df2 = pd.DataFrame({'Amounts':['45','25','65','35','85']})

df1['Capped'] = df1.apply(lambda row: df2['Amounts'].where(
     df2['Amounts'] <= row['Caps'], row['Caps']).sum(), axis=1)

Output:

>>> df1

  Caps      Capped
0   25  2525252525
1   45  4525453545
2   65  4525653565

Upvotes: 2

Views: 155

Answers (1)

jezrael
jezrael

Reputation: 862641

First is necessary convert values to integers by Series.astype:

df1['Caps'] = df1['Caps'].astype(int)
df2['Amounts'] = df2['Amounts'].astype(int)
df1['Capped'] = df1.apply(lambda row: df2['Amounts'].where(
                   df2['Amounts'] <= row['Caps'], row['Caps']).sum(), axis=1)

print (df1)
   Caps  Capped
0    25     125
1    45     195
2    65     235

For improve performance is possible use numpy.where with broadcasting:

df1['Caps'] = df1['Caps'].astype(int)
df2['Amounts'] = df2['Amounts'].astype(int)

am = df2['Amounts'].to_numpy()
ca = df1['Caps'].to_numpy()
#pandas below 0.24
#am = df2['Amounts'].values
#ca = df1['Caps'].values
df1['Capped'] = np.where(am <= ca[:, None], am[None, :], ca[:, None]).sum(axis=1)
print (df1)
   Caps  Capped
0    25     125
1    45     195
2    65     235

Upvotes: 1

Related Questions