Reputation: 5475
Suppose we have a monthly time series, possibly with missing months, and upon loading the data into a pandas Series object with DatetimeIndex we wish to make sure each date observation is labeled as an end-of-month date. However, the raw input dates may fall anywhere in the month, so we need to force them to end-of-month observations.
My first thought was to do something like this:
import pandas as pd
pd.DatetimeIndex([datetime(2012,1,20), datetime(2012,7,31)], freq='M')
However, this just leaves the dates as is [2012-01-20,2012-07-31] and does not force them to end-of-month values [2012-01-31,2012-07-31].
My second attempt was:
ix = pd.DatetimeIndex([datetime(2012,1,20), datetime(2012,7,31)], freq='M')
s = pd.Series(np.random.randn(len(ix)), index=ix)
s.asfreq('M')
But this gives:
2012-01-31 NaN
2012-02-29 NaN
2012-03-31 NaN
2012-04-30 NaN
2012-05-31 NaN
2012-06-30 NaN
2012-07-31 0.79173
Freq: M
as under the hood the asfreq
function is calling date_range
for a DatetimeIndex.
This problem is easily solved if I'm using PeriodIndex
instead of DatetimeIndex
; however, I need to support some frequencies that are not currently supported by PeriodIndex
and as far as I know there is no way to extend pandas with my own Period
frequencies.
Upvotes: 1
Views: 1945
Reputation: 80456
It's a workaround, but it works without using periodindex
:
from pandas.tseries.offsets import *
In [164]: s
Out[164]:
2012-01-20 -1.266376
2012-07-31 -0.865573
In [165]: s.index=s.index+MonthEnd(n=0)
In [166]: s
Out[166]:
2012-01-31 -1.266376
2012-07-31 -0.865573
Upvotes: 1