Reputation: 11937
I have a pandas dataframe with two datetime columns like shown below,
d1 d2
0 2016-07-05 2016-08-15
1 2016-09-14 2016-10-11
2 2016-10-12 2016-11-08
3 2016-11-09 2016-12-06
4 2016-12-07 2017-01-03
5 2017-01-04 2017-02-28
6 2017-03-01 2017-03-28
7 2017-04-03 2017-04-30
8 2017-05-03 2017-05-30
9 2017-06-02 2017-06-29
10 2019-04-25 2019-05-22
I want to make anothe column named flag
based on the condition,
If the last d2 and d1 is seperated by more than 1 day, then set flag as last flag + 1. Otherwise use the same flag. For the first row the flag is always 1
.
The result I am looking for is,
0 1
1 2
2 2
3 2
4 2
5 2
6 2
7 3
8 4
9 5
10 6
I am using this long code to achieve this,
(~test['d1'].sub(test['d2'].shift()).fillna(pd.Timedelta(days=0)).le(pd.Timedelta(days=1))).cumsum() + 1
I am worried about the efficency of this line of code. Is there any better alternative I can choose.?
Upvotes: 1
Views: 399
Reputation: 2222
I would use shift in combination with assign and cumsum methods. Look at the following code:
df = (df
.assign(d1 = lambda d: pd.to_datetime(d.d1),
d2 = lambda d: pd.to_datetime(d.d2),
difference = lambda d: (d.d1-d.d2.shift()).dt.days.abs(),
flag = lambda d: 1+(d.difference>1).cumsum()))
Upvotes: 2