Dwight Temple
Dwight Temple

Reputation: 47

Pandas: Rolling statistics with business hours

I have the following table indexed with a timestamp.

The data is during business hours 8 AM - 11 PM but extends over multiple days

When I perform a rolling statistic like mean, and specify the period to be 24 hours, the window overlaps from January 5th back into January 4th.

Specifically, I am wondering how I can efficiently perform the rolling period to ONLY include the current day that is in the window.

My current inefficient method is to create a custom function that selects valid timestamp indices for each rolling calculation, but this is extremely slow.

def mean(x):
    x = x[(x.index.hour >= 8) & (x.index.hour <= 23)]
    return 100.0 * (sum(x) / (len(x)))

Index Value Normal Desired
2021-01-04 08:35:15 0 0 0
2021-01-04 10:35:45 0 0 0
2021-01-04 16:35:30 1 0.333 0.333
2021-01-04 21:35:00 1 0.5 0.5
2021-01-05 08:15:00 1 0.6 1.0
2021-01-05 08:35:15 0 0.5 0.5
2021-01-05 12:35:42 0 0.428 0.333
2021-01-05 14:35:24 1 0.5 0.5
2021-01-04 20:35:23 0 0.444 0.4

Upvotes: 0

Views: 380

Answers (1)

dicristina
dicristina

Reputation: 425

You can get the desired result by chaining a groupby operation with a windowing operation. According to the documentation this will "first group the data by the specified keys and then perform a windowing operation per group".

In [711]: df.groupby(lambda x: x.date()).rolling('1D').mean().reset_index(0, drop=True)
Out[711]:
                        Value
2021-01-04 08:35:15  0.000000
2021-01-04 10:35:45  0.000000
2021-01-04 16:35:30  0.333333
2021-01-04 21:35:00  0.500000
2021-01-05 08:15:00  1.000000
2021-01-05 08:35:15  0.500000
2021-01-05 12:35:42  0.333333
2021-01-05 14:35:24  0.500000
2021-01-05 20:35:23  0.400000

Upvotes: 1

Related Questions