Change starting and ending hour of pandas timestamp

Question

I am dealing with a dataset where observations occur between opening and closing hours -- but the service closes on the day after it opens. For example, opening occurs at 7am and closing at 1am, the following day.

This feels like a very common problem -- I've searched around for it and am open to the fact I might just not know the correct terms to search for.

For most of my uses it's enough to do something like:

   open_close = pd.DatetimeIndex(start='2012-01-01 05:00:00', periods = 15, offset='D')

Then I can just do fun little groupbys on the df:
df.groupby(open_close.asof).agg(func).

But I've run into an instance where I need to grab multiple of these open-close periods. What I really want to be able to do is just have an DatetimeIndex where I get to pick when an day starts. So I could just redefine 'day' to be from 5AM to 5AM. The nice thing about this is I can then use things like df[df.index.dayofweek == 6] and get back everything from 5AM on Sunday to 5AM on Monda.

It feels like Periods...or something inside of pandas anticipated this request. Would love help figuring it out.

EDIT:

I've also figured this out via creating another column with the right day
df['shift_day'] = df['datetime'].apply(magicFunctionToFigureOutOpenClose)
-- so this isn't blocking my progress. Just feels like something that could be nicely integrated into the package (or datetime...or somewhere...)

Garrett · Accepted Answer

Perhaps the base parameter of df.resample() would help:

base : int, default 0
    For frequencies that evenly subdivide 1 day, the "origin" of the
    aggregated intervals. For example, for '5min' frequency, base could
    range from 0 through 4. Defaults to 0

Here's an example:

In [44]: df = pd.DataFrame(np.random.rand(28),
....:           index=pd.DatetimeIndex(start='2012/9/1', periods=28, freq='H'))

In [45]: df
Out[45]: 
                            0
2012-09-01 00:00:00  0.970273
2012-09-01 01:00:00  0.730171
2012-09-01 02:00:00  0.508588
2012-09-01 03:00:00  0.535351
2012-09-01 04:00:00  0.940255
2012-09-01 05:00:00  0.143483
2012-09-01 06:00:00  0.792659
2012-09-01 07:00:00  0.231413
2012-09-01 08:00:00  0.071676
2012-09-01 09:00:00  0.995202
2012-09-01 10:00:00  0.236551
2012-09-01 11:00:00  0.904853
2012-09-01 12:00:00  0.652873
2012-09-01 13:00:00  0.488400
2012-09-01 14:00:00  0.396647
2012-09-01 15:00:00  0.967261
2012-09-01 16:00:00  0.554188
2012-09-01 17:00:00  0.884086
2012-09-01 18:00:00  0.418577
2012-09-01 19:00:00  0.189584
2012-09-01 20:00:00  0.577041
2012-09-01 21:00:00  0.100332
2012-09-01 22:00:00  0.294672
2012-09-01 23:00:00  0.925425
2012-09-02 00:00:00  0.630807
2012-09-02 01:00:00  0.400261
2012-09-02 02:00:00  0.156469
2012-09-02 03:00:00  0.658608

In [46]: df.resample("24H", how=sum, label='left', closed='left', base=5)
Out[46]: 
                             0
2012-08-31 05:00:00   3.684638
2012-09-01 05:00:00  11.671068

In [47]: df.ix[:5].sum()
Out[47]: 0    3.684638

In [48]: df.ix[5:].sum()
Out[48]: 0    11.671068

Change starting and ending hour of pandas timestamp

Answers (1)

Related Questions