Applying lambda row on df with if statement

Question

I have two data frames, trying to use entries from df1 to limit amounts in df2, then add them up. It seems like my code is limiting right, but not adding the amounts up.

Code:

import pandas as pd

df1 = pd.DataFrame({'Caps':['25','45','65']})

df2 = pd.DataFrame({'Amounts':['45','25','65','35','85']})

df1['Capped'] = df1.apply(lambda row: df2['Amounts'].where(
     df2['Amounts'] <= row['Caps'], row['Caps']).sum(), axis=1)

Output:

>>> df1

  Caps      Capped
0   25  2525252525
1   45  4525453545
2   65  4525653565

jezrael · Accepted Answer

First is necessary convert values to integers by Series.astype:

df1['Caps'] = df1['Caps'].astype(int)
df2['Amounts'] = df2['Amounts'].astype(int)
df1['Capped'] = df1.apply(lambda row: df2['Amounts'].where(
                   df2['Amounts'] <= row['Caps'], row['Caps']).sum(), axis=1)

print (df1)
   Caps  Capped
0    25     125
1    45     195
2    65     235

For improve performance is possible use numpy.where with broadcasting:

df1['Caps'] = df1['Caps'].astype(int)
df2['Amounts'] = df2['Amounts'].astype(int)

am = df2['Amounts'].to_numpy()
ca = df1['Caps'].to_numpy()
#pandas below 0.24
#am = df2['Amounts'].values
#ca = df1['Caps'].values
df1['Capped'] = np.where(am <= ca[:, None], am[None, :], ca[:, None]).sum(axis=1)
print (df1)
   Caps  Capped
0    25     125
1    45     195
2    65     235

Applying lambda row on df with if statement

Answers (1)

Related Questions