James MacAdie
James MacAdie

Reputation: 599

Pandas aggregate values by years but keep original TimeSeries index

I have a time series of monthly data. I'd like to sum aggregate the values by year but then keep the original TimeSeries index. This is probably best illustrated by example:

# April 2012 to Nov 2053
dates = pd.date_range('2012-04-01',periods=500,freq='MS')

# Random time series over date range
a = pd.Series(np.arange(500), index=dates)

# Almost works but I'm missing the last 7 months:
# May 2053 to Nov 2053
b = a.resample('AS-APR', how='sum').resample('MS', fill_method='pad')

Any idea how I can getb to contain the full 500 time periods including the missing last 7 months? They need to be pad filled from the value in April 2053.

Upvotes: 1

Views: 2770

Answers (1)

Viktor Kerkez
Viktor Kerkez

Reputation: 46566

Use reindex instead:

b = a.resample('AS-APR', how='sum').reindex(a.index, method='pad')

This way you will get the same index as the original Series object, padded as you wanted.

The problem with resample is that when you first resampled a, the last entry become april 2053. So when you do your second resampling the end date will be 2053-04-01. So it did the correct resampling but the first one moved the end date from november to april.

If you would want to do a different frequency than the original array, you could do it again using this method:

b = a.resample('AS-APR', how='sum').reindex(a.resample('D').index, method='pad')

Upvotes: 1

Related Questions