Reputation: 5552
I'm dealing with futures data, where the current day starts before 00:00:00. I need to do resampling of 1 minute data to 1 hour data, taking into account the date offset. Let's see an example:
df1 - 1min data
Open High Low Close
2005-09-06 17:27:00 1234.75 1234.75 1234.75 1234.75
2005-09-06 17:28:00 1234.75 1234.75 1234.75 1234.75
2005-09-06 17:29:00 1234.75 1234.75 1234.75 1234.75
2005-09-06 17:30:00 1234.75 1234.75 1234.50 1234.50
2005-09-06 18:01:00 1234.50 1234.50 1234.25 1234.50
2005-09-06 18:02:00 1234.50 1234.50 1234.50 1234.50
2005-09-06 18:03:00 1234.50 1234.50 1234.50 1234.50
2005-09-06 18:04:00 1234.50 1234.50 1234.50 1234.50
2005-09-06 18:05:00 1234.50 1234.50 1234.25 1234.25
This is what happens with a regular resample:
conversion = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
In [77]: normal_resample = df1.resample(rule='60Min', how=conversion)
In [78]: normal_resample
Out[79]:
Open High Low Close
2005-09-06 17:00:00 1234.75 1234.75 1234.50 1234.50
2005-09-06 18:00:00 1234.50 1234.50 1234.25 1234.25
This is the desired output accounting for the offset:
conversion = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
In [77]: offset_resample = df1.resample(rule='60Min', how=conversion) + offset:18:00
In [78]: offset_resample
Out[79]:
Open High Low Close
2005-09-06 17:00:00 1234.75 1234.75 1234.50 1234.50
2005-09-07 18:00:00 1234.50 1234.50 1234.25 1234.25
Notice how what I want is specify that the new day starts at '18:00:00', not at midnight.
What have I done: I have a formula that includes the offset, resampling to daily data, but I don't know how to adapt it for intraday resampling.
def resample_day(df):
df.index = pd.DatetimeIndex(df.index)
df = df.between_time('18:00', '16:00', include_start=True, include_end=True)
proxy = df.index + pd.DateOffset(hours=6) # This is the piece that does the trick
result = df.groupby(proxy.date).agg(
{'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'})
result = result.reindex(columns=['Open', 'High', 'Low', 'Close'])
return result
Also, what is suggested here, doesn't work, for me at least.
Thanks for the input.
Upvotes: 3
Views: 1580
Reputation: 375475
You could just add on a day (if it the time is past 5pm):
In [11]: df1.index.time > datetime.time(17)
Out[11]: array([False, True], dtype=bool)
In [12]: df1.index + np.where((df1.index.time > datetime.time(17)), pd.offsets.Day(1).nanos, 0)
Out[12]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2005-09-06 17:00:00, 2005-09-07 18:00:00]
Length: 2, Freq: 60T, Timezone: None
Upvotes: 2