Pandas: Rolling statistics with business hours

Question

I have the following table indexed with a timestamp.

The data is during business hours 8 AM - 11 PM but extends over multiple days

When I perform a rolling statistic like mean, and specify the period to be 24 hours, the window overlaps from January 5th back into January 4th.

Specifically, I am wondering how I can efficiently perform the rolling period to ONLY include the current day that is in the window.

My current inefficient method is to create a custom function that selects valid timestamp indices for each rolling calculation, but this is extremely slow.

def mean(x):
    x = x[(x.index.hour >= 8) & (x.index.hour <= 23)]
    return 100.0 * (sum(x) / (len(x)))

Index	Value	Normal	Desired
2021-01-04 08:35:15	0	0	0
2021-01-04 10:35:45	0	0	0
2021-01-04 16:35:30	1	0.333	0.333
2021-01-04 21:35:00	1	0.5	0.5
2021-01-05 08:15:00	1	0.6	1.0
2021-01-05 08:35:15	0	0.5	0.5
2021-01-05 12:35:42	0	0.428	0.333
2021-01-05 14:35:24	1	0.5	0.5
2021-01-04 20:35:23	0	0.444	0.4

dicristina · Accepted Answer

You can get the desired result by chaining a groupby operation with a windowing operation. According to the documentation this will "first group the data by the specified keys and then perform a windowing operation per group".

In [711]: df.groupby(lambda x: x.date()).rolling('1D').mean().reset_index(0, drop=True)
Out[711]:
                        Value
2021-01-04 08:35:15  0.000000
2021-01-04 10:35:45  0.000000
2021-01-04 16:35:30  0.333333
2021-01-04 21:35:00  0.500000
2021-01-05 08:15:00  1.000000
2021-01-05 08:35:15  0.500000
2021-01-05 12:35:42  0.333333
2021-01-05 14:35:24  0.500000
2021-01-05 20:35:23  0.400000

Pandas: Rolling statistics with business hours

Answers (1)

Related Questions