user8322222
user8322222

Reputation: 529

Fill a column in a dataframe if a condition is met

I have the following dataframe:

PersonID  AmountPaid  PaymentReceivedDate  StartDate withinNYears  
1         100         2017                   2016   
2         20          2014                   2014
1         30          2017                   2016
1         40          2016                   2016
4         300         2015                   2000
5         150         2005                   2002  

What I'm looking for is the Amount Paid should appear in the withNYears column if the payment was made within n years of start date otherwise you get NaN. N years can be any number but let's say 2 for this example (as I will be playing with this to see findings).

so basically the above dataframe would come out like this if the amount was paid within 2 years:

PersonID  AmountPaid  PaymentReceivedDate  StartDate  withinNYears  
1         100         2017                   2016     100
2         20          2014                   2014     20
1         30          2017                   2016     30 
1         40          2016                   2016     40
4         300         2015                   2000     NaN
5         150         2005                   2002     NaN

does anyone know how to achieve this? cheers.

Upvotes: 1

Views: 1153

Answers (2)

jezrael
jezrael

Reputation: 862481

Subtract columns and compare by scalar for boolean mask and then set value by numpy.where, Series.where or DataFrame.loc:

m = (df['PaymentReceivedDate'] - df['StartDate']) < 2
df['withinNYears'] = np.where(m, df['AmountPaid'], np.nan)
#alternatives
#df['withinNYears'] = df['AmountPaid'].where(m)
#df.loc[m, 'withinNYears'] = df['AmountPaid']

print (df)
   PersonID  AmountPaid  PaymentReceivedDate  StartDate   \
0         1         100                 2017                    2016   
1         2          20                 2014                    2014   
2         1          30                 2017                    2016   
3         1          40                 2016                    2016   
4         4         300                 2015                    2000   
5         5         150                 2005                    2002   

   withinNYears  
0         100.0  
1          20.0  
2          30.0  
3          40.0  
4           NaN  
5           NaN

EDIT:

If StartDate column have datetimes:

m = (df['PaymentReceivedDate'] - df['StartDate'].dt. year) < 2

Upvotes: 3

BENY
BENY

Reputation: 323226

Just do with assign using loc

df.loc[(df['PaymentReceivedDate'] - df['StartDate']<2),'withinNYears']=df.AmountPaid
df
Out[37]: 
   PersonID  AmountPaid      ...       StartDate  withinNYears
0         1         100      ...            2016         100.0
1         2          20      ...            2014          20.0
2         1          30      ...            2016          30.0
3         1          40      ...            2016          40.0
4         4         300      ...            2000           NaN
5         5         150      ...            2002           NaN
[6 rows x 5 columns]

Upvotes: 3

Related Questions