aeapen
aeapen

Reputation: 923

how to compare two date by iterating in a pandas data frame and create a new column

I have a pandas data frame with customer transactions as shown below and create a column named 'Label' with 2 different values

Input

Transaction ID    Transaction Start Date  Transaction End Date 

      1               23-jun-2014              15-Jul-2014

      2               14-jul-2014              8-Aug-2014        

      3               13-Aug-2014              22-Aug-2014        

      4               21-Aug-2014              28-Aug-2014      

      5               29-Aug-2014              05-Sep-2014

      6               06-Sep-2014              15-Sep-2014

Desired output

Transaction ID    Transaction Start Date  Transaction End Date  Label

  1               23-jun-2014              15-Jul-2014

  2               14-jul-2014              8-Aug-2014       New Transaction performed before end date of previous transaction

  3               13-Aug-2014              22-Aug-2014      New Transaction after the end date of previous transaction.    

  4               21-Aug-2014              28-Aug-2014      New Transaction performed before the end date of previous transaction.

  5               29-Aug-2014              05-Sep-2014      New Transaction after the end date of previous transaction.

  6               06-Sep-2014              15-Sep-2014      New Transaction after the end date of previous transaction.

Upvotes: 0

Views: 228

Answers (2)

jezrael
jezrael

Reputation: 863531

Use to_datetime first, then numpy.where with Series.lt form less compred shifted values by Series.shift and last set first value to empty string:

df['Transaction End Date'] = pd.to_datetime(df['Transaction End Date'])
df['Transaction Start Date'] = pd.to_datetime(df['Transaction Start Date'])

df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()), 
                       'New Transaction performed before end date of previous transaction', 
                       'New Transaction after the end date of previous transaction.')
df.loc[0, 'Label'] = ''

Alternative solution:

m = df['Transaction Start Date'].lt(df['Transaction End Date'].shift())

df['Label'] = [''] + np.where(m, 
              'New Transaction performed before end date of previous transaction', 
              'New Transaction after the end date of previous transaction.')[1:].tolist()

print (df)
   Transaction ID Transaction Start Date Transaction End Date  \
0               1             2014-06-23           2014-07-15   
1               2             2014-07-14           2014-08-08   
2               3             2014-08-13           2014-08-22   
3               4             2014-08-21           2014-08-28   
4               5             2014-08-29           2014-09-05   
5               6             2014-09-06           2014-09-15   

                                               Label  
                                                     
1  New Transaction performed before end date of p...  
2  New Transaction after the end date of previous...  
3  New Transaction performed before end date of p...  
4  New Transaction after the end date of previous...  
5  New Transaction after the end date of previous...  

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34086

Use numpy.where and Series.shift:

import numpy as np

df['Label'] = np.where(df['Transaction Start Date'].lt(df['Transaction End Date'].shift()), 'New Transaction performed before end date of previous transaction', 'New Transaction after the end date of previous transaction.')

Upvotes: 1

Related Questions