bensw
bensw

Reputation: 3028

Merge rows that have time overlapping - Pandas

Supposedly, I have dataframe like below.

    Date    Time_Start  Time_End    Reason
0   2018-08-05  2018-08-05 10:10:00     2018-08-05 13:35:00     blah1
1   2018-08-05  2018-08-05 12:50:00     2018-08-05 14:26:00     blah2
2   2018-08-05  2018-08-05 16:40:00     2018-08-05 17:30:00     blah3
3   2018-08-05  2018-08-05 17:00:00     2018-08-05 17:10:00     blah4
4   2018-08-06  2018-08-06 09:40:00     2018-08-06 11:23:00     blah5

I would like to know if pandas is capable of merging rows that have time overlapped e.g. take "Time_Start" of row one and merge with "Time_End" of second row and append the reason to a list. Another case is to remove the forth row as it is a sub time window of the third row and appending the reasons to a list. So, my expected output would look like this

    Date    Time_Start  Time_End    Reason
0   2018-08-05  2018-08-05 10:10:00     2018-08-05 14:26:00     [blah1, blah2]
1   2018-08-05  2018-08-05 16:40:00     2018-08-05 17:30:00     [blah3, blah4]
3   2018-08-06  2018-08-06 09:40:00     2018-08-06 11:23:00     [blah5] 

Note that there can be more than two rows of data which have time overlapping.

Upvotes: 1

Views: 166

Answers (1)

BENY
BENY

Reputation: 323376

You can create a help key here

df['newid']=(df['Time_Start']-df['Time_End'].shift()).dt.total_seconds().gt(0).cumsum()

newdf=df.groupby(['Date','newid']).agg({'Time_Start':'min','Time_End':'max','Reason':lambda x : [x.values.tolist()]}).reset_index(level=1)
newdf.assign(Reason=newdf.Reason.str[0])
Out[592]: 
            newid          Time_Start          Reason            Time_End
Date                                                                     
2018-08-05      0 2018-08-05 10:10:00  [blah1, blah2] 2018-08-05 14:26:00
2018-08-05      1 2018-08-05 16:40:00  [blah3, blah4] 2018-08-05 17:30:00
2018-08-06      2 2018-08-06 09:40:00         [blah5] 2018-08-06 11:23:00

Upvotes: 2

Related Questions