Reputation: 3028
Supposedly, I have dataframe like below.
Date Time_Start Time_End Reason
0 2018-08-05 2018-08-05 10:10:00 2018-08-05 13:35:00 blah1
1 2018-08-05 2018-08-05 12:50:00 2018-08-05 14:26:00 blah2
2 2018-08-05 2018-08-05 16:40:00 2018-08-05 17:30:00 blah3
3 2018-08-05 2018-08-05 17:00:00 2018-08-05 17:10:00 blah4
4 2018-08-06 2018-08-06 09:40:00 2018-08-06 11:23:00 blah5
I would like to know if pandas is capable of merging rows that have time overlapped e.g. take "Time_Start" of row one and merge with "Time_End" of second row and append the reason to a list. Another case is to remove the forth row as it is a sub time window of the third row and appending the reasons to a list. So, my expected output would look like this
Date Time_Start Time_End Reason
0 2018-08-05 2018-08-05 10:10:00 2018-08-05 14:26:00 [blah1, blah2]
1 2018-08-05 2018-08-05 16:40:00 2018-08-05 17:30:00 [blah3, blah4]
3 2018-08-06 2018-08-06 09:40:00 2018-08-06 11:23:00 [blah5]
Note that there can be more than two rows of data which have time overlapping.
Upvotes: 1
Views: 166
Reputation: 323376
You can create a help key here
df['newid']=(df['Time_Start']-df['Time_End'].shift()).dt.total_seconds().gt(0).cumsum()
newdf=df.groupby(['Date','newid']).agg({'Time_Start':'min','Time_End':'max','Reason':lambda x : [x.values.tolist()]}).reset_index(level=1)
newdf.assign(Reason=newdf.Reason.str[0])
Out[592]:
newid Time_Start Reason Time_End
Date
2018-08-05 0 2018-08-05 10:10:00 [blah1, blah2] 2018-08-05 14:26:00
2018-08-05 1 2018-08-05 16:40:00 [blah3, blah4] 2018-08-05 17:30:00
2018-08-06 2 2018-08-06 09:40:00 [blah5] 2018-08-06 11:23:00
Upvotes: 2