Reputation: 8778
Say I have the following time series, which starts on 2014-06-01 which is a Sunday.
In [7]:
# 2014-06-01 is Sunday
df = pd.Series( index=pd.date_range( '2014-06-01', periods=30 ), data=nr.randn( 30 ) ) #
df
I can resample weekly, starting on Sundays and closing on Saturdays:
In [9]:
df.resample( 'W-SAT' )
Out[9]:
2014-06-07 0.119460
2014-06-14 0.464789
2014-06-21 -1.211579
2014-06-28 0.650210
2014-07-05 0.666044
Freq: W-SAT, dtype: float64
Ok now I want to the same thing but every 2 weeks, so I try this:
In [11]:
df.resample( '2W-SAT' )
Out[11]:
2014-06-07 0.119460
2014-06-21 -0.373395
2014-07-05 0.653729
Freq: 2W-SAT, dtype: float64
Oh, the output is 1 week and then 2 weeks and 2 weeks. That's not what I expected. I was expecting the first index entry to be '2014-06-14'. Why is it doing that? How do I get the first 2 weeks to be resampled together?
Upvotes: 7
Views: 7453
Reputation: 8778
After trying the various options of resample
, I might have an explanation. The way resample
chooses the first entry of the new resampled index seems to depend on the closed
option:
closed=left
, resample
looks for the latest possible startclosed=right
, resample
looks for the earliest possible startI will illustrate with an example:
# 2014-06-01 is Sunday
df = pd.Series( index=pd.date_range( '2014-06-01', periods=30 ), data=range(1 , 31 ) ) #
df
The following example illustrates the behaviour of closed=left
. The latest "left-side" Saturday of a 2 weeks interval closed on the left happens on 2014-05-31, as shown by the following:
df.resample( '2W-SAT',how='sum', closed='left', label='left' )
Out[119]:
2014-05-31 91
2014-06-14 287
2014-06-28 87
Freq: 2W-SAT, dtype: int64
The next example illustrates the behaviour of closed=right
, which is the one that I didn't understand in my initial post (closed=right
by default in resample
). The earliest "right-side" Saturday of a 2 weeks interval closed on the right happens on 2014/06/07, as shown by the following:
df.resample( '2W-SAT',how='sum', closed='right', label='right' )
Out[122]:
2014-06-07 28
2014-06-21 203
2014-07-05 234
Freq: 2W-SAT, dtype: int64
Upvotes: 9
Reputation: 1059
The first saturday of the month of june 2014 is the 7th, so it starts on the seventh. If you try with sunday, it starts on the first of june as expected.
df.resample( '2W-SUN' )
Out[11]:
2014-06-01 0.739895
2014-06-15 0.497950
2014-06-29 0.445480
2014-07-13 0.767430
Freq: 2W-SUN, dtype: float64
Upvotes: 0