Reputation: 787
What I'm trying to accomplish is a conditional check that looks at the value in the amount column. If the value in df['amount'] is less than 200k, nothing is done. If the value in df['amount'] is greater than or equal to 200k, replace the corresponding with either the value in months or 36.00 depending on whichever is greater.
import pandas as pd
df['amount'] = [332374.00, 22250.75, 45282.10, 339720.00, 1100.00, 40000.00, 15000.00, 207820.00, 497432.00]
df['months'] = [18.00, 17.00, 16.00, 46.00, 14.00, 13.00, 13.00, 13.00, 12.00]
df['checks'] = [1.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00, 1.00, 1.00]
The desired output is a dataset that looks like this.
amount months checks
332374.00 36.00 1.00
22250.75 17.00 0.00
45282.10 16.00 0.00
339720.00 46.00 1.00
1100.00 14.00 0.00
40000.00 13.00 0.00
15000.00 13.00 0.00
207820.00 36.00 1.00
497432.00 36.00 1.00
Upvotes: 1
Views: 49
Reputation: 109546
You can just use loc
to find instances where the amount exceeds your 200k threshold and where months is less than 36. Then just set those values to 36.
df.loc[(df['amount'] >= 200000) & (df['months'] < 36), 'months'] = 36
>>> df
amount months checks
0 332374.00 36 1
1 22250.75 17 0
2 45282.10 16 0
3 339720.00 46 1
4 1100.00 14 0
5 40000.00 13 0
6 15000.00 13 0
7 207820.00 36 1
8 497432.00 36 1
Timings
Using assign
will be slower because you have to copy the entire dataframe and then assign the new column. Using loc
will be faster.
Here are the timings on this small dataset:
%timeit df.assign(months=df.months.mask(df.amount.ge(2E5), np.maximum(df.months, 36)))
# 1000 loops, best of 3: 1.01 ms per loop
%timeit df.loc[(df['amount'] >= 200000) & (df['months'] < 36), 'months'] = 36
# 1000 loops, best of 3: 838 µs per loop
Upvotes: 3
Reputation: 294338
pd.Series.mask
allows you to mask values of the series in the position of the True
values passed in the cond
argument. You can also pass an other
argument to supply the value to replace the ones being masked. I used np.maximum
to generate the other
values.
df.assign(months=df.months.mask(df.amount.ge(2E5), np.maximum(df.months, 36)))
amount months checks
0 332374.00 36.0 1.0
1 22250.75 17.0 0.0
2 45282.10 16.0 0.0
3 339720.00 46.0 1.0
4 1100.00 14.0 0.0
5 40000.00 13.0 0.0
6 15000.00 13.0 0.0
7 207820.00 36.0 1.0
8 497432.00 36.0 1.0
See also pd.Series.where
as we could have performed the same task with:
df.assign(months=df.months.where(df.amount.lt(2E5), np.maximum(df.months, 36)))
amount months checks
0 332374.00 36.0 1.0
1 22250.75 17.0 0.0
2 45282.10 16.0 0.0
3 339720.00 46.0 1.0
4 1100.00 14.0 0.0
5 40000.00 13.0 0.0
6 15000.00 13.0 0.0
7 207820.00 36.0 1.0
8 497432.00 36.0 1.0
We can also perform this task using the same logic but utilizing numpy.where
a = df.amount.values
m = df.months.values
df.assign(months=np.where(a < 2E5, m, np.maximum(m, 36)))
amount months checks
0 332374.00 36.0 1.0
1 22250.75 17.0 0.0
2 45282.10 16.0 0.0
3 339720.00 46.0 1.0
4 1100.00 14.0 0.0
5 40000.00 13.0 0.0
6 15000.00 13.0 0.0
7 207820.00 36.0 1.0
8 497432.00 36.0 1.0
Upvotes: 3