Sreeram TP
Sreeram TP

Reputation: 11937

Compare two date columns and then based on difference set a flag in Pandas

I have a pandas dataframe with two datetime columns like shown below,

d1  d2
0   2016-07-05  2016-08-15
1   2016-09-14  2016-10-11
2   2016-10-12  2016-11-08
3   2016-11-09  2016-12-06
4   2016-12-07  2017-01-03
5   2017-01-04  2017-02-28
6   2017-03-01  2017-03-28
7   2017-04-03  2017-04-30
8   2017-05-03  2017-05-30
9   2017-06-02  2017-06-29
10  2019-04-25  2019-05-22

I want to make anothe column named flag based on the condition,

If the last d2 and d1 is seperated by more than 1 day, then set flag as last flag + 1. Otherwise use the same flag. For the first row the flag is always 1.

The result I am looking for is,

0     1
1     2
2     2
3     2
4     2
5     2
6     2
7     3
8     4
9     5
10    6

I am using this long code to achieve this,

(~test['d1'].sub(test['d2'].shift()).fillna(pd.Timedelta(days=0)).le(pd.Timedelta(days=1))).cumsum() + 1

I am worried about the efficency of this line of code. Is there any better alternative I can choose.?

Upvotes: 1

Views: 399

Answers (1)

ivallesp
ivallesp

Reputation: 2222

I would use shift in combination with assign and cumsum methods. Look at the following code:

df = (df
     .assign(d1 = lambda d: pd.to_datetime(d.d1),
             d2 = lambda d: pd.to_datetime(d.d2),
             difference = lambda d: (d.d1-d.d2.shift()).dt.days.abs(),
             flag = lambda d: 1+(d.difference>1).cumsum()))

Upvotes: 2

Related Questions