FrankData
FrankData

Reputation: 21

pd.date_range how to exclude few hours

Hi guy I had a question about using the pd.date_range(). I am doing ARIMA model and in one step I need to forecast some price values.For example, at time 2021-01-04 11:20. I would like to generate next 4 date index with freq = '5Min', so I write the following code

pd.date_range(start = '2021-01-04 11:20', periods = 5, freq = '5Min')

And this gives

['2021-01-04 11:20', '2021-01-04 11:25', '2021-01-04 11:30', '2021-01-04 11:35', '2021-01-04 11:40']

But the market opens in the afternoon. So after 11:30, the market will open at '2021-01-04 15:00', so the series should be.

['2021-01-04 11:20', '2021-01-04 11:25', '2021-01-04 15:00', '2021-01-04 15:05', '2021-01-04 15:10'].

So How to custom the freq so that I could exclude some 'hour-ranges' during a day?

Plz thanks! I really appreciate it!

Upvotes: 2

Views: 890

Answers (2)

jezrael
jezrael

Reputation: 863341

Use DatetimeIndex.indexer_between_time for positions and then filter out this values by np.isin in boolean indexing:

r = pd.date_range(start = '2021-01-04 00:00', periods = 100, freq = '30Min')

ind = (r.indexer_between_time('11:30','13:30').tolist() +
       r.indexer_between_time('15:00','21:00').tolist() +
       r.indexer_between_time('23:00','09:00').tolist())
# print (ind)

out = r[np.isin(np.arange(len(r)), ind, invert=True)]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
               '2021-01-04 10:30:00', '2021-01-04 11:00:00',
               '2021-01-04 14:00:00', '2021-01-04 14:30:00',
               '2021-01-04 21:30:00', '2021-01-04 22:00:00',
               '2021-01-04 22:30:00', '2021-01-05 09:30:00',
               '2021-01-05 10:00:00', '2021-01-05 10:30:00',
               '2021-01-05 11:00:00', '2021-01-05 14:00:00',
               '2021-01-05 14:30:00', '2021-01-05 21:30:00',
               '2021-01-05 22:00:00', '2021-01-05 22:30:00'],
              dtype='datetime64[ns]', freq=None)

Another idea is use masks:

from datetime import time

r = pd.date_range(start = '2021-01-04 00:00', periods = 100, freq = '30Min')

m = ((r.time > time(hour=9, minute=0)) & (r.time < time(hour=11, minute=30)) |
     (r.time > time(hour=13, minute=30)) & (r.time < time(hour=15, minute=0)) |
     (r.time > time(hour=21, minute=0)) & (r.time < time(hour=23, minute=0)))
       
print (m)

out = r[m]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
               '2021-01-04 10:30:00', '2021-01-04 11:00:00',
               '2021-01-04 14:00:00', '2021-01-04 14:30:00',
               '2021-01-04 21:30:00', '2021-01-04 22:00:00',
               '2021-01-04 22:30:00', '2021-01-05 09:30:00',
               '2021-01-05 10:00:00', '2021-01-05 10:30:00',
               '2021-01-05 11:00:00', '2021-01-05 14:00:00',
               '2021-01-05 14:30:00', '2021-01-05 21:30:00',
               '2021-01-05 22:00:00', '2021-01-05 22:30:00'],
              dtype='datetime64[ns]', freq=None)

Next alternative with numpy.r_ for join indices and filter by them:

ind1 = (np.r_[r.indexer_between_time('9:00','11:30', include_start=False, include_end=False),
              r.indexer_between_time('13:30','15:00', include_start=False, include_end=False),
              r.indexer_between_time('21:00','23:00', include_start=False, include_end=False)])

out = r[ind1]
print (out)
DatetimeIndex(['2021-01-04 09:30:00', '2021-01-04 10:00:00',
               '2021-01-04 10:30:00', '2021-01-04 11:00:00',
               '2021-01-05 09:30:00', '2021-01-05 10:00:00',
               '2021-01-05 10:30:00', '2021-01-05 11:00:00',
               '2021-01-04 14:00:00', '2021-01-04 14:30:00',
               '2021-01-05 14:00:00', '2021-01-05 14:30:00',
               '2021-01-04 21:30:00', '2021-01-04 22:00:00',
               '2021-01-04 22:30:00', '2021-01-05 21:30:00',
               '2021-01-05 22:00:00', '2021-01-05 22:30:00'],
              dtype='datetime64[ns]', freq=None)

Upvotes: 2

Simon
Simon

Reputation: 1211

I don't know what other time limits you face, but could you just adjust it with a condition and a list comprehension? I do not believe pd.date_range has any default parameter that can do what you're asking for.

# setup
dt_range = pd.date_range(start = '2021-01-04 11:20', periods = 5, freq = '5Min')
# time condition
market_open = "11:30"
# list comprehension
dt_range = [time + pd.DateOffset(hours=3, minutes=30) if time.strftime('%H:%M') >= market_open else time for time in dt_range]
# convert back to panda time series
dt_range = pd.to_datetime(dt_range)

print(dt_range)

output:

DatetimeIndex(['2021-01-04 11:20:00', '2021-01-04 11:25:00',
               '2021-01-04 15:00:00', '2021-01-04 15:05:00',
               '2021-01-04 15:10:00'],
              dtype='datetime64[ns]', freq=None)

Upvotes: 1

Related Questions