RealGecko
RealGecko

Reputation: 115

Generating "working hours" interval index with pandas

With pandas we can do something like this:

>>> i1 = pandas.Interval(pandas.Timestamp('2021-08-25 09:00:00'), pandas.Timestam
p('2021-08-25 18:00:00'))
>>> i2 = pandas.Interval(pandas.Timestamp('2021-08-26 09:00:00'), pandas.Timestam
p('2021-08-26 18:00:00'))
>>> ii = pandas.IntervalIndex([i1, i2])
>>> ii
IntervalIndex([(2021-08-25 09:00:00, 2021-08-25 18:00:00], (2021-08-26 09:00:00, 
2021-08-26 18:00:00]],
              closed='right',
              dtype='interval[datetime64[ns]]')

This way we get interval index of person's working hours for two days. But it's tedious and not really DRY(imagine creating such index for all working days in month). The question is if it's possible to do the same stuff but with less code? Maybe with the help of pandas.interval_range just like we do with pandas.date_range

>>> pandas.date_range(date(2021, 1, 1), date(2021, 7, 1), freq='B')
DatetimeIndex(['2021-01-01', '2021-01-04', '2021-01-05', '2021-01-06',
               '2021-01-07', '2021-01-08', '2021-01-11', '2021-01-12',
               '2021-01-13', '2021-01-14',
               ...
               '2021-06-18', '2021-06-21', '2021-06-22', '2021-06-23',
               '2021-06-24', '2021-06-25', '2021-06-28', '2021-06-29',
               '2021-06-30', '2021-07-01'],
              dtype='datetime64[ns]', length=130, freq='B')

Upvotes: 3

Views: 174

Answers (1)

tdy
tdy

Reputation: 41387

IntervalIndex.from_arrays accepts arrays for the left and right bounds, so you can generate those bounds with date_range:

in_times = pd.date_range('2021-08-25 09:00:00', '2021-09-25 09:00:00', freq='B')
out_times = pd.date_range('2021-08-25 18:00:00', '2021-09-25 18:00:00', freq='B')

ii = pd.IntervalIndex.from_arrays(left=in_times, right=out_times)
# IntervalIndex([(2021-08-25 09:00:00, 2021-08-25 18:00:00],
#                (2021-08-26 09:00:00, 2021-08-26 18:00:00],
#                (2021-08-27 09:00:00, 2021-08-27 18:00:00],
#                ...
#                (2021-09-22 09:00:00, 2021-09-22 18:00:00],
#                (2021-09-23 09:00:00, 2021-09-23 18:00:00],
#                (2021-09-24 09:00:00, 2021-09-24 18:00:00]],
#               dtype='interval[datetime64[ns], right]')

Note that by default, these intervals are only closed on the right:

# (2021-08-25 09:00:00, 2021-08-25 18:00:00]

So add closed='both' if desired:

# [2021-08-25 09:00:00, 2021-08-25 18:00:00]

Upvotes: 2

Related Questions