Reputation: 1667
I am dealing with a dataset where observations occur between opening and closing hours -- but the service closes on the day after it opens. For example, opening occurs at 7am and closing at 1am, the following day.
This feels like a very common problem -- I've searched around for it and am open to the fact I might just not know the correct terms to search for.
For most of my uses it's enough to do something like:
open_close = pd.DatetimeIndex(start='2012-01-01 05:00:00', periods = 15, offset='D')
Then I can just do fun little groupbys on the df:
df.groupby(open_close.asof).agg(func).
But I've run into an instance where I need to grab multiple of these open-close periods. What I really want to be able to do is just have an DatetimeIndex where I get to pick when an day starts. So I could just redefine 'day' to be from 5AM to 5AM. The nice thing about this is I can then use things like df[df.index.dayofweek == 6]
and get back everything from 5AM on Sunday to 5AM on Monda.
It feels like Periods...or something inside of pandas anticipated this request. Would love help figuring it out.
EDIT:
I've also figured this out via creating another column with the right day
df['shift_day'] = df['datetime'].apply(magicFunctionToFigureOutOpenClose)
-- so this isn't blocking my progress. Just feels like something that could be nicely integrated into the package (or datetime...or somewhere...)
Upvotes: 2
Views: 1347
Reputation: 49876
Perhaps the base
parameter of df.resample() would help:
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0
Here's an example:
In [44]: df = pd.DataFrame(np.random.rand(28),
....: index=pd.DatetimeIndex(start='2012/9/1', periods=28, freq='H'))
In [45]: df
Out[45]:
0
2012-09-01 00:00:00 0.970273
2012-09-01 01:00:00 0.730171
2012-09-01 02:00:00 0.508588
2012-09-01 03:00:00 0.535351
2012-09-01 04:00:00 0.940255
2012-09-01 05:00:00 0.143483
2012-09-01 06:00:00 0.792659
2012-09-01 07:00:00 0.231413
2012-09-01 08:00:00 0.071676
2012-09-01 09:00:00 0.995202
2012-09-01 10:00:00 0.236551
2012-09-01 11:00:00 0.904853
2012-09-01 12:00:00 0.652873
2012-09-01 13:00:00 0.488400
2012-09-01 14:00:00 0.396647
2012-09-01 15:00:00 0.967261
2012-09-01 16:00:00 0.554188
2012-09-01 17:00:00 0.884086
2012-09-01 18:00:00 0.418577
2012-09-01 19:00:00 0.189584
2012-09-01 20:00:00 0.577041
2012-09-01 21:00:00 0.100332
2012-09-01 22:00:00 0.294672
2012-09-01 23:00:00 0.925425
2012-09-02 00:00:00 0.630807
2012-09-02 01:00:00 0.400261
2012-09-02 02:00:00 0.156469
2012-09-02 03:00:00 0.658608
In [46]: df.resample("24H", how=sum, label='left', closed='left', base=5)
Out[46]:
0
2012-08-31 05:00:00 3.684638
2012-09-01 05:00:00 11.671068
In [47]: df.ix[:5].sum()
Out[47]: 0 3.684638
In [48]: df.ix[5:].sum()
Out[48]: 0 11.671068
Upvotes: 1