Aasheet Kumar
Aasheet Kumar

Reputation: 351

Efficient way to determine total time taking overlap into account

I am using pandas dataframe for the following:

I am trying to find the best way to determine the total time spent by a ship at a particular berth taking into account overlap in the duration of the visit. Here is what the data looks like:

 IN                    OUT                        BERTH 
2015-01-14 13:57:00   2015-01-15 17:15:00         01
2015-01-14 14:30:00   2015-01-15 02:50:00         01
2015-01-14 14:30:00   2015-01-16 06:10:00         01
2015-01-25 02:15:00   2015-01-26 13:41:00         01

what exactly i want is to find out is the total time a particular berth was used. So looking at the data there are overlaps and so i cant simply add the times for each record.

looking at the above data we can see that the 2nd ship timing is completely within the first so the time recorded for that would be 0, and that the 3rd ship comes before the first but stays till after the 1st one leaves so here the time would be = (out of 3rd ship - in of 1st) and then we would move to the next one as there is no overlap there and simply add [out of 4 - in of 4] to the total time spent on berth 1, and continue this till the end producing a something like this :

BERTH   HOURS WORKED
  01    7.750
  02    10.275
  03    5.585
  08    31.980

Upvotes: 1

Views: 408

Answers (1)

DYZ
DYZ

Reputation: 57033

Here's a solution for one berth. I hope you can expand it to multiple berths.

Split the dataframe into arrivals and departures:

dfIN = df[['IN']]
dfOUT = df[['OUT']]
dfIN['direction'] = 1
dfOUT['diretcion'] = -1

As of now, IN and OUT are just timestamps:

dfIN.columns = ('TS', 'direction')
dfOUT.columns = ('TS', 'direction')

Combine the two parts into one tall sorted dataframe of traffic:

traffic = pd.concat([dfIN, dfOUT]).sort_values('TS')
#                   TS  direction
#0 2015-01-14 13:57:00          1
#1 2015-01-14 14:30:00          1
#2 2015-01-14 14:30:00          1
#1 2015-01-15 02:50:00         -1
#0 2015-01-15 17:15:00         -1
#2 2015-01-16 06:10:00         -1
#3 2015-01-25 02:15:00          1
#3 2015-01-26 13:41:00         -1

Calculate the number of ships at the berth, as they arrive and depart:

traffic['ships'] = traffic['direction'].cumsum()

Identify time periods when the berth is empty. Then calculate the sequence number of each "busy period."

traffic['empty'] = (traffic['ships'] == 0).shift().fillna(0).astype(int)
traffic['busy_id'] = traffic['empty'].cumsum()
#                   TS  direction  ships  empty busy_id
#0 2015-01-14 13:57:00          1      1      0       0
#1 2015-01-14 14:30:00          1      2      0       0
#2 2015-01-14 14:30:00          1      3      0       0
#1 2015-01-15 02:50:00         -1      2      0       0
#0 2015-01-15 17:15:00         -1      1      0       0
#2 2015-01-16 06:10:00         -1      0      0       0
#3 2015-01-25 02:15:00          1      1      1       1
#3 2015-01-26 13:41:00         -1      0      0       1

Calculate the start and end of each "busy period":

busy_data = traffic.groupby('busy_id')['TS'].agg([min, max])
#                      min                 max
#busy_id                                        
#0     2015-01-14 13:57:00 2015-01-16 06:10:00
#1     2015-01-25 02:15:00 2015-01-26 13:41:00

Calculate the total length of all "busy periods":

(busy_data['max'] - busy_data['min']).sum()
#Timedelta('3 days 03:39:00')

Upvotes: 1

Related Questions