Reputation: 529
I have the following dataframe:
PersonID AmountPaid PaymentReceivedDate StartDate withinNYears
1 100 2017 2016
2 20 2014 2014
1 30 2017 2016
1 40 2016 2016
4 300 2015 2000
5 150 2005 2002
What I'm looking for is the Amount Paid should appear in the withNYears column if the payment was made within n years of start date otherwise you get NaN. N years can be any number but let's say 2 for this example (as I will be playing with this to see findings).
so basically the above dataframe would come out like this if the amount was paid within 2 years:
PersonID AmountPaid PaymentReceivedDate StartDate withinNYears
1 100 2017 2016 100
2 20 2014 2014 20
1 30 2017 2016 30
1 40 2016 2016 40
4 300 2015 2000 NaN
5 150 2005 2002 NaN
does anyone know how to achieve this? cheers.
Upvotes: 1
Views: 1153
Reputation: 862481
Subtract columns and compare by scalar for boolean mask and then set value by numpy.where
, Series.where
or DataFrame.loc
:
m = (df['PaymentReceivedDate'] - df['StartDate']) < 2
df['withinNYears'] = np.where(m, df['AmountPaid'], np.nan)
#alternatives
#df['withinNYears'] = df['AmountPaid'].where(m)
#df.loc[m, 'withinNYears'] = df['AmountPaid']
print (df)
PersonID AmountPaid PaymentReceivedDate StartDate \
0 1 100 2017 2016
1 2 20 2014 2014
2 1 30 2017 2016
3 1 40 2016 2016
4 4 300 2015 2000
5 5 150 2005 2002
withinNYears
0 100.0
1 20.0
2 30.0
3 40.0
4 NaN
5 NaN
EDIT:
If StartDate
column have datetimes:
m = (df['PaymentReceivedDate'] - df['StartDate'].dt. year) < 2
Upvotes: 3
Reputation: 323226
Just do with assign using loc
df.loc[(df['PaymentReceivedDate'] - df['StartDate']<2),'withinNYears']=df.AmountPaid
df
Out[37]:
PersonID AmountPaid ... StartDate withinNYears
0 1 100 ... 2016 100.0
1 2 20 ... 2014 20.0
2 1 30 ... 2016 30.0
3 1 40 ... 2016 40.0
4 4 300 ... 2000 NaN
5 5 150 ... 2002 NaN
[6 rows x 5 columns]
Upvotes: 3