Reputation: 6371
I've got a DataFrame
storing daily-based data which is as below:
Date Open High Low Close Volume
2010-01-04 38.660000 39.299999 38.509998 39.279999 1293400
2010-01-05 39.389999 39.520000 39.029999 39.430000 1261400
2010-01-06 39.549999 40.700001 39.020000 40.250000 1879800
2010-01-07 40.090000 40.349998 39.910000 40.090000 836400
2010-01-08 40.139999 40.310001 39.720001 40.290001 654600
2010-01-11 40.209999 40.520000 40.040001 40.290001 963600
2010-01-12 40.160000 40.340000 39.279999 39.980000 1012800
2010-01-13 39.930000 40.669998 39.709999 40.560001 1773400
2010-01-14 40.490002 40.970001 40.189999 40.520000 1240600
2010-01-15 40.570000 40.939999 40.099998 40.450001 1244200
What I intend to do is to merge it into weekly-based data. After grouping:
which should look like this:
Date Open High Low Close Volume
2010-01-04 38.660000 40.700001 38.509998 40.290001 5925600
2010-01-11 40.209999 40.970001 39.279999 40.450001 6234600
Currently, my code snippet is as below, which function should I use to mapping daily-based data to the expected weekly-based data? Many thanks!
import pandas_datareader.data as web
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2016, 12, 31)
f = web.DataReader("MNST", "yahoo", start, end, session=session)
print f
Upvotes: 33
Views: 56632
Reputation: 42885
You can resample
(to weekly), offset
(shift), and apply
aggregation rules as follows [updated to reflect changes in pandas 1.1]:
logic = {'Open' : 'first',
'High' : 'max',
'Low' : 'min',
'Close' : 'last',
'Volume': 'sum'}
df = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
df = df.resample('W').apply(logic)
# changes in Pandas 1.1
from pandas.tseries.frequencies import to_offset
df.index = df.index - to_offset('6D')
to get:
Open High Low Close Volume
Date
2010-01-04 38.660000 40.700001 38.509998 40.290001 5925600
2010-01-11 40.209999 40.970001 39.279999 40.450001 6234600
Upvotes: 44
Reputation: 1
There was one limitation in the mentioned code that if there was an holiday on Monday then it would not take that data so I have created a function to overcome that. Hope it helps. Mention the day name as "Monday" or whatever you want
def convert_to_weekly(df,day):
#Initialising dataframe to store converted data
converted_data = pd.DataFrame()
converted_data_temp = pd.DataFrame({"Date":[0],"Open":[0],"High":[0],"Low":[0],"Close":[0],"Volume":[0]})
#Converting Date to Datetime and setting that as index
df['Date str'] = df['Date']
df['Date'] = df['Date'].map(lambda a : datetime.datetime.strptime(a, '%d/%m/%Y'))
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
#Finding first date which is our start Day
start_date = df.index[0]
for i in range(0,6):
start_date = start_date + timedelta(days=i)
day_name = start_date.strftime("%A")
if(day_name == day):
break
#Taking a week back to include middle data
start_date = start_date - timedelta(days=7)
while(start_date < df.index[-1]):
stop_date = start_date + timedelta(days=7)
data_temp = df[start_date:stop_date - timedelta(days=1)]
if(data_temp.size == 0):
start_date = start_date + timedelta(days=7)
continue
data_temp = data_temp.reset_index()
converted_data_temp['Date'] = data_temp['Date str'].iloc[0]
converted_data_temp['Open'] = data_temp['Open'].iloc[0]
converted_data_temp['High'] = data_temp['High'].max()
converted_data_temp['Low'] = data_temp['Low'].min()
converted_data_temp['Close'] = data_temp['Close'].iloc[-1]
try:
converted_data_temp['Volume'] = data_temp['Volume'].sum()
except:
pass
converted_data = pd.concat([converted_data, converted_data_temp])
start_date = start_date + timedelta(days=7)
converted_data = converted_data.reset_index()
converted_data = converted_data.drop(['index'],axis=1)
return converted_data
df_w = convert_to_weekly(df, "Monday")
Upvotes: 0
Reputation: 1171
import pandas as pd
from pandas.tseries.frequencies import to_offset
df = pd.read_csv('your_ticker.csv')
logic = {'<Open>' : 'first',
'<High>' : 'max',
'<Low>' : 'min',
'<Close>' : 'last',
'<Volume>': 'sum'}
df['<DTYYYYMMDD>'] = pd.to_datetime(df['<DTYYYYMMDD>'])
df = df.set_index('<DTYYYYMMDD>')
df = df.sort_index()
df = df.resample('W').apply(logic)
df.index = df.index - pd.tseries.frequencies.to_offset("6D")
Upvotes: 3
Reputation: 387
At first I use df.resample() according to the answers forementioned, but it fills NaN when a week is missed, unhappy about that, after some research, I use groupby() instead of resample(). Thanks for your sharing.
My original data is:
c date h l o
260 6014.78 20220321 6053.90 5984.79 6030.43
261 6052.59 20220322 6099.53 5995.22 6012.17
262 6040.86 20220323 6070.85 6008.26 6059.11
263 6003.05 20220324 6031.73 5987.40 6020.00
264 5931.33 20220325 6033.04 5928.72 6033.04
265 5946.98 20220328 5946.98 5830.93 5871.35
266 5900.04 20220329 5958.71 5894.82 5950.89
267 6003.05 20220330 6003.05 5913.08 5913.08
268 6033.04 20220331 6059.11 5978.27 5993.92
269 6126.91 20220401 6134.74 5975.66 6006.96
270 6149.08 20220406 6177.77 6106.05 6126.91
271 6134.74 20220407 6171.25 6091.71 6130.83
272 6151.69 20220408 6160.82 6096.93 6147.78
273 6095.62 20220411 6166.03 6072.15 6164.73
274 6184.28 20220412 6228.62 6049.99 6094.32
275 6119.09 20220413 6180.37 6117.79 6173.85
276 6188.20 20220414 6201.24 6132.13 6150.38
277 6173.85 20220415 6199.93 6137.35 6137.35
278 6124.31 20220418 6173.85 6108.66 6173.85
279 6065.63 20220419 6147.78 6042.16 6124.31
I don't care the date is not Monday, so I didn't handle that, the code is:
data['Date'] = pd.to_datetime(data['date'], format="%Y%m%d")
# Refer to: https://www.techtrekking.com/how-to-convert-daily-time-series-data-into-weekly-and-monthly-using-pandas-and-python/
# and here: https://stackoverflow.com/a/60518425/5449346
# and this: https://github.com/pandas-dev/pandas/issues/11217#issuecomment-145253671
logic = {'o' : 'first',
'h' : 'max',
'l' : 'min',
'c' : 'last',
'Date': 'first',
}
data = data.groupby([data['Date'].dt.year, data['Date'].dt.week]).agg(logic)
data.set_index('Date', inplace=True)
And the result is, there's no NaN on 2022.01.31 which resample() will produce:
l o h c
Date
2021-11-29 6284.68 6355.09 6421.59 6382.47
2021-12-06 6365.52 6372.04 6700.62 6593.70
2021-12-13 6445.06 6593.70 6690.19 6450.28
2021-12-20 6415.07 6437.24 6531.12 6463.31
2021-12-27 6463.31 6473.75 6794.50 6649.77
2022-01-04 6625.00 6649.77 7089.18 7055.27
2022-01-10 6804.93 7055.27 7181.75 6808.84
2022-01-17 6769.73 6776.25 7098.30 6919.67
2022-01-24 6692.80 6906.63 7048.76 6754.08
2022-02-07 6737.13 6811.45 7056.58 7023.98
2022-02-14 6815.36 7073.53 7086.57 6911.85
2022-02-21 6634.12 6880.56 6904.03 6668.02
2022-02-28 6452.88 6669.33 6671.93 6493.30
2022-03-07 5953.50 6463.31 6468.53 6228.62
2022-03-14 5817.90 6154.30 6205.15 6027.82
2022-03-21 5928.72 6030.43 6099.53 5931.33
2022-03-28 5830.93 5871.35 6134.74 6126.91
2022-04-06 6091.71 6126.91 6177.77 6151.69
2022-04-11 6049.99 6164.73 6228.62 6173.85
2022-04-18 6042.16 6173.85 6173.85 6065.63
Upvotes: 1
Reputation: 4880
Adding to @Stefan 's answer with recent pandas API as loffset
was deprecated since version 1.1.0 and later removed.
df = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
logic = {'Open' : 'first',
'High' : 'max',
'Low' : 'min',
'Close' : 'last',
'Volume': 'sum'}
dfw = df.resample('W').apply(logic)
# set the index to the beginning of the week
dfw.index = dfw.index - pd.tseries.frequencies.to_offset("6D")
Upvotes: 3
Reputation: 131
I had the exact same question and found a great solution here.
The weekly code is posted below.
import pandas as pd
import numpy as np
print('*** Program Started ***')
df = pd.read_csv('15-06-2016-TO-14-06-2018HDFCBANKALLN.csv')
# ensuring only equity series is considered
df = df.loc[df['Series'] == 'EQ']
# Converting date to pandas datetime format
df['Date'] = pd.to_datetime(df['Date'])
# Getting week number
df['Week_Number'] = df['Date'].dt.week
# Getting year. Weeknum is common across years to we need to create unique index by using year and weeknum
df['Year'] = df['Date'].dt.year
# Grouping based on required values
df2 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum'})
# df3 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum','Average Price':'avg'})
df2.to_csv('Weekly_OHLC.csv')
print('*** Program ended ***')
Upvotes: 12
Reputation: 55
Not a direct answer, but suppose the columns are the dates (transpose of your table), without missing dates.
'''sum up daily results in df to weekly results in wdf'''
wdf = pd.DataFrame(index = df.index)
for i in range(len(df.columns)):
if (i!=0) & (i%7==0):
wdf['week'+str(i//7)]= df[df.columns[i-7:i]].sum(axis = 1)
Upvotes: 0
Reputation: 2098
In general, assuming that you have the dataframe in the form you specified, you need to do the following steps:
Date
in the indexresample
the index. What you have is a case of applying different functions to different columns. See.
You can resample in various ways. for e.g. you can take the mean of the values or count or so on. check pandas resample.
You can also apply custom aggregators (check the same link). With that in mind, the code snippet for your case can be given as:
f['Date'] = pd.to_datetime(f['Date'])
f.set_index('Date', inplace=True)
f.sort_index(inplace=True)
def take_first(array_like):
return array_like[0]
def take_last(array_like):
return array_like[-1]
output = f.resample('W', # Weekly resample
how={'Open': take_first,
'High': 'max',
'Low': 'min',
'Close': take_last,
'Volume': 'sum'},
loffset=pd.offsets.timedelta(days=-6)) # to put the labels to Monday
output = output[['Open', 'High', 'Low', 'Close', 'Volume']]
Here, W
signifies a weekly resampling which by default spans from Monday to Sunday. To keep the labels as Monday, loffset
is used.
There are several predefined day specifiers. Take a look at pandas offsets. You can even define custom offsets (see).
Coming back to the resampling method. Here for Open
and Close
you can specify custom methods to take the first value or so on and pass the function handle to the how
argument.
This answer is based on the assumption that the data seems to be daily, i.e. for each day you have only 1 entry. Also, no data is present for the non-business days. i.e. Sat and Sun. So taking the last data point for the week as the one for Friday is ok. If you so want you can use business week instead of 'W'. Also, for more complex data you may want to use groupby
to group the weekly data and then work on the time indices within them.
btw a gist for the solution can be found at: https://gist.github.com/prithwi/339f87bf9c3c37bb3188
Upvotes: 22