Reputation: 923
I have a pandas data frame with customer transactions as shown below and create a column named 'Label' with 2 different values
New Transaction performed before the end date of the previous transaction
New Transaction performed after the end date of the previous transaction
Input
Transaction ID Transaction Start Date Transaction End Date
1 23-jun-2014 15-Jul-2014
2 14-jul-2014 8-Aug-2014
3 13-Aug-2014 22-Aug-2014
4 21-Aug-2014 28-Aug-2014
5 29-Aug-2014 05-Sep-2014
6 06-Sep-2014 15-Sep-2014
Desired output
Transaction ID Transaction Start Date Transaction End Date Label
1 23-jun-2014 15-Jul-2014
2 14-jul-2014 8-Aug-2014 New Transaction performed before end date of previous transaction
3 13-Aug-2014 22-Aug-2014 New Transaction after the end date of previous transaction.
4 21-Aug-2014 28-Aug-2014 New Transaction performed before the end date of previous transaction.
5 29-Aug-2014 05-Sep-2014 New Transaction after the end date of previous transaction.
6 06-Sep-2014 15-Sep-2014 New Transaction after the end date of previous transaction.
Upvotes: 0
Views: 228
Reputation: 863531
Use to_datetime
first, then numpy.where
with Series.lt
form less compred shifted values by Series.shift
and last set first value to empty string:
df['Transaction End Date'] = pd.to_datetime(df['Transaction End Date'])
df['Transaction Start Date'] = pd.to_datetime(df['Transaction Start Date'])
df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()),
'New Transaction performed before end date of previous transaction',
'New Transaction after the end date of previous transaction.')
df.loc[0, 'Label'] = ''
Alternative solution:
m = df['Transaction Start Date'].lt(df['Transaction End Date'].shift())
df['Label'] = [''] + np.where(m,
'New Transaction performed before end date of previous transaction',
'New Transaction after the end date of previous transaction.')[1:].tolist()
print (df)
Transaction ID Transaction Start Date Transaction End Date \
0 1 2014-06-23 2014-07-15
1 2 2014-07-14 2014-08-08
2 3 2014-08-13 2014-08-22
3 4 2014-08-21 2014-08-28
4 5 2014-08-29 2014-09-05
5 6 2014-09-06 2014-09-15
Label
1 New Transaction performed before end date of p...
2 New Transaction after the end date of previous...
3 New Transaction performed before end date of p...
4 New Transaction after the end date of previous...
5 New Transaction after the end date of previous...
Upvotes: 1
Reputation: 34086
Use numpy.where
and Series.shift
:
import numpy as np
df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()), 'New Transaction performed before end date of previous transaction', 'New Transaction after the end date of previous transaction.')
Upvotes: 1