Reputation: 599
I have a time series of monthly data. I'd like to sum aggregate the values by year but then keep the original TimeSeries index. This is probably best illustrated by example:
# April 2012 to Nov 2053
dates = pd.date_range('2012-04-01',periods=500,freq='MS')
# Random time series over date range
a = pd.Series(np.arange(500), index=dates)
# Almost works but I'm missing the last 7 months:
# May 2053 to Nov 2053
b = a.resample('AS-APR', how='sum').resample('MS', fill_method='pad')
Any idea how I can getb
to contain the full 500 time periods including the missing last 7 months? They need to be pad filled from the value in April 2053.
Upvotes: 1
Views: 2770
Reputation: 46566
Use reindex instead:
b = a.resample('AS-APR', how='sum').reindex(a.index, method='pad')
This way you will get the same index as the original Series object, padded as you wanted.
The problem with resample
is that when you first resampled a
, the last entry become april 2053. So when you do your second resampling the end date will be 2053-04-01. So it did the correct resampling but the first one moved the end date from november to april.
If you would want to do a different frequency than the original array, you could do it again using this method:
b = a.resample('AS-APR', how='sum').reindex(a.resample('D').index, method='pad')
Upvotes: 1