Georg Heiler
Georg Heiler

Reputation: 17676

python time interval overlap duration

My question is similar to Efficient date range overlap calculation in python?, however, I need to calculate the overlap with a full timestamp and not days, but more importantly, I cannot specify a specific date as the overlap, rather only hours.

import pandas as pd
import numpy as np

df = pd.DataFrame({'first_ts': {0: np.datetime64('2020-01-25 07:30:25.435000'),
  1: np.datetime64('2020-01-25 07:25:00')},
 'last_ts': {0: np.datetime64('2020-01-25 07:30:25.718000'),
  1: np.datetime64('2020-01-25 07:25:00')}})
df['start_hour'] = 7
df['start_minute'] = 0
df['end_hour'] = 8
df['end_minute'] = 0
display(df)

How can I calculate the overlap duration of the interval (first_ts, last_ts) with the second interval in milliseconds? Potentially, I would need to construct a timestamp on each day with the interval defined by the hours and then calculate the overlap.

Upvotes: 2

Views: 378

Answers (1)

jezrael
jezrael

Reputation: 862671

Idea is create new Series for start and end datetimes with dates by datetimes columns, use numpy.minimum and numpy.maximum, subtract, convert timedeltas by Series.dt.total_seconds and multiple by 1000:

s = (df['first_ts'].dt.strftime('%Y-%m-%d ') + 
     df['start_hour'].astype(str) + ':' + 
     df['start_minute'].astype(str))
e = (df['last_ts'].dt.strftime('%Y-%m-%d ') + 
     df['end_hour'].astype(str) + ':' +
     df['end_minute'].astype(str))

s = pd.to_datetime(s, format='%Y-%m-%d %H:%M')
e = pd.to_datetime(e, format='%Y-%m-%d %H:%M')

df['inter'] = ((np.minimum(e, df['last_ts']) - 
                np.maximum(s, df['first_ts'])).dt.total_seconds() * 1000)
print (df)
                 first_ts                 last_ts  start_hour  start_minute  \
0 2020-01-25 07:30:25.435 2020-01-25 07:30:25.718           7             0   
1 2020-01-25 07:25:00.000 2020-01-25 07:25:00.000           7             0   

   end_hour  end_minute  inter  
0         8           0  283.0  
1         8           0    0.0  

Another idea is use only np.minumum:

df['inter'] = (np.minimum(df['last_ts'] - df['first_ts'], e - s).dt.total_seconds() * 1000)
print (df)
                 first_ts                 last_ts  start_hour  start_minute  \
0 2020-01-25 07:30:25.435 2020-01-25 07:30:25.718           7             0   
1 2020-01-25 07:25:00.000 2020-01-25 07:25:00.000           7             0   

   end_hour  end_minute  inter  
0         8           0  283.0  
1         8           0    0.0  

Upvotes: 3

Related Questions