Reputation: 1913
So I start out with a pd.Series
called jpm
, and I would like to group it into weeks and take the last value from each week. This works with the code below, it does get the last value. But it changes corresponding index to the Sunday of the week, and I would like it to leave it unchaged.
import pandas_datareader.data as web
import pandas as pd
start = pd.datetime(2015, 11, 1)
end = pd.datetime(2015, 11, 17)
raw_jpm = web.DataReader("JPM", 'yahoo', start, end)["Adj Close"]
jpm = raw_jpm.ix[raw_jpm.index[::2]]
jpm
is now
Date
2015-11-02 64.125610
2015-11-04 64.428918
2015-11-06 66.982593
2015-11-10 66.219427
2015-11-12 64.575682
2015-11-16 65.074678
Name: Adj Close, dtype: float64
I want to do some operations to it, such as
weekly = jpm.groupby(pd.TimeGrouper('W')).last()
weekly
is now
Date
2015-11-08 66.982593
2015-11-15 64.575682
2015-11-22 65.074678
Freq: W-SUN, Name: Adj Close, dtype: float64
which is great, except all my dates got changed. The output I want, is:
Date
2015-11-06 66.982593
2015-11-12 64.575682
2015-11-16 65.074678
Upvotes: 2
Views: 1081
Reputation: 1913
It seems a little tricky to do this in pure pandas, so I used numpy
import numpy as np
weekly = jpm.groupby(pd.TimeGrouper('W-SUN')).last()
weekly.index = jpm.index[np.searchsorted(jpm.index, weekly.index, side="right")-1]
Upvotes: 1
Reputation: 29711
You could provide a DateOffset
by specifying the class name Week
and indicating the weekly frequency W-FRI
, by setting the dayofweek
property as 4 [Monday : 0 → Sunday : 6]
jpm.groupby(pd.TimeGrouper(freq=pd.offsets.Week(weekday=4))).last().tail(5)
Date
2016-08-19 65.860001
2016-08-26 66.220001
2016-09-02 67.489998
2016-09-09 66.650002
2016-09-16 65.820000
Freq: W-FRI, Name: Adj Close, dtype: float64
If you want the starting date as the next monday from start
date and the previous sunday from the end
date, you could do this way:
from datetime import datetime, timedelta
start = datetime(2015, 11, 1)
monday = start + timedelta(days=(7 - start.weekday()))
end = datetime(2016, 9, 30)
sunday = end - timedelta(days=end.weekday() + 1)
print (monday)
2015-11-02 00:00:00
print (sunday)
2016-09-25 00:00:00
Then, use it as:
jpm = web.DataReader('JPM', 'yahoo', monday, sunday)["Adj Close"]
jpm.groupby(pd.TimeGrouper(freq='7D')).last()
To get it all on a Sunday, as you specified the range Monday → Sunday and Sunday being the last day for the date to be considered, you could do a small hack:
monday_new = monday - timedelta(days=3)
jpm = web.DataReader('JPM', 'yahoo', monday_new, sunday)["Adj Close"]
jpm.groupby(pd.TimeGrouper(freq='W')).last().head()
Date
2015-11-01 62.863448
2015-11-08 66.982593
2015-11-15 64.145175
2015-11-22 66.082449
2015-11-29 65.720431
Freq: W-SUN, Name: Adj Close, dtype: float64
Now that you've posted the desired output, you can arrive at the result using transform
method instead of taking the aggregated last
, so that it returns an object that is indexed the same size as the one being grouped.
df = jpm.groupby(pd.TimeGrouper(freq='W')).transform('last').reset_index(name='Last')
df
df['counter'] = (df['Last'].shift() != df['Last']).astype(int).cumsum()
df.groupby(['Last','counter'])['Date'].apply(lambda x: np.array(x)[-1]) \
.reset_index().set_index('Date').sort_index()['Last']
Date
2015-11-06 66.982593
2015-11-12 64.575682
2015-11-16 65.074678
Name: Last, dtype: float64
Note: This is capable of handling repeated entries that occur in two separate dates due to the inclusion of the counter
column which bins them separately into two buckets.
Upvotes: 1
Reputation: 210832
you can do it this way:
In [15]: jpm
Out[15]:
Date
2015-11-02 64.125610
2015-11-04 64.428918
2015-11-06 66.982593
2015-11-10 66.219427
2015-11-12 64.575682
2015-11-16 65.074678
Name: Adj Close, dtype: float64
In [16]: jpm.groupby(jpm.index.week).transform('last').drop_duplicates(keep='last')
Out[16]:
Date
2015-11-06 66.982593
2015-11-12 64.575682
2015-11-16 65.074678
dtype: float64
Explanation:
In [17]: jpm.groupby(jpm.index.week).transform('last')
Out[17]:
Date
2015-11-02 66.982593
2015-11-04 66.982593
2015-11-06 66.982593
2015-11-10 64.575682
2015-11-12 64.575682
2015-11-16 65.074678
dtype: float64
Upvotes: 1