ElRudi
ElRudi

Reputation: 2324

Pandas: resample hourly values to monthly values with offset

I want to aggregate a pandas.Series with an hourly DatetimeIndex to monthly values - while considering the offset to midnight.

Example

Consider the following (uniform) timeseries that spans about 1.5 months.

import pandas as pd
hours = pd.Series(1, pd.date_range('2020-02-23 06:00', freq = 'H', periods=1008))
hours
# 2020-02-23 06:00:00    1
# 2020-02-23 07:00:00    1
#                       ..
# 2020-04-05 04:00:00    1
# 2020-04-05 05:00:00    1
# Freq: H, Length: 1000, dtype: int64

I would like to sum these to months while considering, that days start at 06:00 in this use-case. The result should be:

2020-02-01 06:00:00    168
2020-03-01 06:00:00    744
2020-04-01 06:00:00     96
freq: MS, dtype: int64

How do I do that??


What I've tried and what works

Upvotes: 5

Views: 1166

Answers (1)

Haleemur Ali
Haleemur Ali

Reputation: 28243

Not too much of an improvement on your attempt, but you could write the resampling as

months = hours.resample('D', offset='06:00:00').sum().resample('MS').sum()

changing the index labels still requires the hack you've been doing, as in adding the time delta manually and setting freq to MS

note that you can pass a string representation of the time delta to offset.

The reason two resampling operations are needed is because when the resampling frequency is greater than 'D', the offset is ignored. Once your resample at the daily level is performed with the offset, the result can be further resampled without specifying the offset.

I believe this is buggy behaviour, and I agree with you that hours.resample('MS', offset='06:00:00').sum() should produce the expected result.

Essentially, there are two issues:

  1. the binning is incorrect when there is an offset applied & the frequency is greater than 'D'. The offset is ignored.
  2. the offset is not reflected in the final output, the output truncates to the start or end of the period. I'm not sure if the behaviour you're expecting can be generalized for all users.

That there is a related bug issue impacting resampling with offsets. I have not determined yet whether that and the issue you face have the same root cause. Its the same root cause.

Upvotes: 1

Related Questions