Reputation: 1119
Consider the following code:
r = pandas.date_range(datetime(2014,5,26),datetime(2014,6,6))
ts = pandas.Series(np.random.randn(len(r)), index=r)
print(ts.asfreq(pandas.DateOffset(days=5),how='end'))
I think I'm not getting the correct sense of how "how" parameter should be used. With the above code I would have expected that starting from the end, the result from asfreq would return every 5 days. Still I get:
2014-05-26 0.456856
2014-05-31 -0.552287
2014-06-05 0.169554
Freq: <DateOffset: kwds={'days': 5}>, dtype: float64
If I do
print(ts.asfreq(pandas.DateOffset(days=5),how='start'))
makes now difference and receive exactly the same result.
I then see that on the documentation of pandas.Series.asfreq the following is mentioned:
how : {‘start’, ‘end’}, default end
For PeriodIndex only, see PeriodIndex.asfreq
, which points to the problem as in my case I need to use DatetimeIndex.
My question is then what is the proper call in my example to always return a series with the last point on say 2014-05-30 for a range with end date 2014-6-6 , no matter the begin date that I set on the range ? asfreq doesn't seem to work with descending indexes so reversing the index seems to not be an option either...
Upvotes: 1
Views: 1730
Reputation: 11
To answer your question, there are several issues need to be addressed here:
First of all, I don't quite see the point of using DateOffset here, you can simply replace it with "5D" and it will generate the same result.
Secondly, a better practice is to use period_range to generate your time index. An example is shown after the third point.
Lastly, it seems you don't understand what asfreq is doing when "how" option is used. In a nutshell, "how" option is used when the "freq" inside asfreq is set to a larger frequency (or smaller time intervals). Let me illustrate this using an example below:
import pandas as pd
import numpy as np
rng = pd.period_range('20140526','20140606')
If I set the frequency as "start":
print(rng.asfreq('H', how='start'))
The result is:
PeriodIndex(['2014-05-26 00:00', '2014-05-27 00:00', '2014-05-28 00:00',
'2014-05-29 00:00', '2014-05-30 00:00', '2014-05-31 00:00',
'2014-06-01 00:00', '2014-06-02 00:00', '2014-06-03 00:00',
'2014-06-04 00:00', '2014-06-05 00:00', '2014-06-06 00:00'],
dtype='int64', freq='H')
All the hours are set to 00:00 that day.
However, if "how" is set to "end"
print(rng.asfreq('H', how='end'))
The result becomes:
PeriodIndex(['2014-05-26 23:00', '2014-05-27 23:00', '2014-05-28 23:00',
'2014-05-29 23:00', '2014-05-30 23:00', '2014-05-31 23:00',
'2014-06-01 23:00', '2014-06-02 23:00', '2014-06-03 23:00',
'2014-06-04 23:00', '2014-06-05 23:00', '2014-06-06 23:00'],
dtype='int64', freq='H')
All the hours are set to 23:00, in other words, the last hour time stamp of each day.
So the point here is: "how" is useful only when the new frequency (hour) is larger than the old one (day). But in such case, you are not subsetting or resampling your time series but rather give each time point a new index based on the new frequency.
As to how to achieve your goal, since you prefix all the important time points, why not creating a period index using those time points:
r = pd.period_range('20140530','20140609',freq="5D")
print(r)
PeriodIndex(['2014-05-30', '2014-06-04', '2014-06-09'], dtype='int64', freq='5D')
Upvotes: 1