Judking
Judking

Reputation: 6371

converting daily stock data to weekly-based via pandas in Python

I've got a DataFrame storing daily-based data which is as below:

Date              Open        High         Low       Close   Volume
2010-01-04   38.660000   39.299999   38.509998   39.279999  1293400   
2010-01-05   39.389999   39.520000   39.029999   39.430000  1261400   
2010-01-06   39.549999   40.700001   39.020000   40.250000  1879800   
2010-01-07   40.090000   40.349998   39.910000   40.090000   836400   
2010-01-08   40.139999   40.310001   39.720001   40.290001   654600   
2010-01-11   40.209999   40.520000   40.040001   40.290001   963600   
2010-01-12   40.160000   40.340000   39.279999   39.980000  1012800   
2010-01-13   39.930000   40.669998   39.709999   40.560001  1773400   
2010-01-14   40.490002   40.970001   40.189999   40.520000  1240600   
2010-01-15   40.570000   40.939999   40.099998   40.450001  1244200   

What I intend to do is to merge it into weekly-based data. After grouping:

  1. the Date should be every Monday (at this point, holidays scenario should be considered when Monday is not a trading day, we should apply the first trading day in current week as the Date).
  2. Open should be Monday's (or the first trading day of current week) Open.
  3. Close should be Friday's (or the last trading day of current week) Close.
  4. High should be the highest High of trading days in current week.
  5. Low should be the lowest Low of trading days in current week.
  6. Volumn should be the sum of all Volumes of trading days in current week.

which should look like this:

Date              Open        High         Low       Close   Volume
2010-01-04   38.660000   40.700001   38.509998   40.290001  5925600   
2010-01-11   40.209999   40.970001   39.279999   40.450001  6234600   

Currently, my code snippet is as below, which function should I use to mapping daily-based data to the expected weekly-based data? Many thanks!

import pandas_datareader.data as web

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2016, 12, 31)
f = web.DataReader("MNST", "yahoo", start, end, session=session)
print f

Upvotes: 33

Views: 56632

Answers (8)

Stefan
Stefan

Reputation: 42885

You can resample (to weekly), offset (shift), and apply aggregation rules as follows [updated to reflect changes in pandas 1.1]:

logic = {'Open'  : 'first',
         'High'  : 'max',
         'Low'   : 'min',
         'Close' : 'last',
         'Volume': 'sum'}
    
df = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
df = df.resample('W').apply(logic)

# changes in Pandas 1.1
from pandas.tseries.frequencies import to_offset
df.index = df.index - to_offset('6D')

to get:

                 Open       High        Low      Close   Volume
Date                                                           
2010-01-04  38.660000  40.700001  38.509998  40.290001  5925600
2010-01-11  40.209999  40.970001  39.279999  40.450001  6234600

Upvotes: 44

crazyman
crazyman

Reputation: 1

There was one limitation in the mentioned code that if there was an holiday on Monday then it would not take that data so I have created a function to overcome that. Hope it helps. Mention the day name as "Monday" or whatever you want

def convert_to_weekly(df,day):
    #Initialising dataframe to store converted data
    converted_data = pd.DataFrame()        
    converted_data_temp = pd.DataFrame({"Date":[0],"Open":[0],"High":[0],"Low":[0],"Close":[0],"Volume":[0]})
    
    #Converting Date to Datetime and setting that as index
    df['Date str'] = df['Date']
    df['Date'] = df['Date'].map(lambda a : datetime.datetime.strptime(a, '%d/%m/%Y'))
    df.set_index('Date', inplace=True)
    df.sort_index(inplace=True)
    
    #Finding first date which is our start Day
    start_date = df.index[0]
    for i in range(0,6):
        start_date = start_date + timedelta(days=i)
        day_name = start_date.strftime("%A")

        if(day_name == day):
            break
    
    #Taking a week back to include middle data
    start_date = start_date - timedelta(days=7)
    
    while(start_date < df.index[-1]):
        
        stop_date = start_date + timedelta(days=7)
        
        data_temp = df[start_date:stop_date - timedelta(days=1)]
        
        if(data_temp.size == 0):
            start_date = start_date + timedelta(days=7)
            continue
        
        data_temp = data_temp.reset_index()
        
        converted_data_temp['Date'] = data_temp['Date str'].iloc[0]
        converted_data_temp['Open'] = data_temp['Open'].iloc[0]
        converted_data_temp['High'] = data_temp['High'].max()
        converted_data_temp['Low'] = data_temp['Low'].min()
        converted_data_temp['Close'] = data_temp['Close'].iloc[-1]
        try:
            converted_data_temp['Volume'] = data_temp['Volume'].sum()
        except:
            pass
        
        converted_data = pd.concat([converted_data, converted_data_temp])
        
        start_date = start_date + timedelta(days=7)
    
    converted_data = converted_data.reset_index()
    converted_data = converted_data.drop(['index'],axis=1)
    return converted_data

df_w = convert_to_weekly(df, "Monday")

Upvotes: 0

William Le
William Le

Reputation: 1171

Updated solution for 2022

import pandas as pd
from pandas.tseries.frequencies import to_offset

df = pd.read_csv('your_ticker.csv')

logic = {'<Open>'  : 'first',
         '<High>'  : 'max',
         '<Low>'   : 'min',
         '<Close>' : 'last',
         '<Volume>': 'sum'}
df['<DTYYYYMMDD>'] = pd.to_datetime(df['<DTYYYYMMDD>'])
df = df.set_index('<DTYYYYMMDD>')
df = df.sort_index()
df = df.resample('W').apply(logic)
df.index = df.index - pd.tseries.frequencies.to_offset("6D")

Upvotes: 3

tom
tom

Reputation: 387

At first I use df.resample() according to the answers forementioned, but it fills NaN when a week is missed, unhappy about that, after some research, I use groupby() instead of resample(). Thanks for your sharing.

My original data is:

           c      date        h        l        o
260  6014.78  20220321  6053.90  5984.79  6030.43
261  6052.59  20220322  6099.53  5995.22  6012.17
262  6040.86  20220323  6070.85  6008.26  6059.11
263  6003.05  20220324  6031.73  5987.40  6020.00
264  5931.33  20220325  6033.04  5928.72  6033.04
265  5946.98  20220328  5946.98  5830.93  5871.35
266  5900.04  20220329  5958.71  5894.82  5950.89
267  6003.05  20220330  6003.05  5913.08  5913.08
268  6033.04  20220331  6059.11  5978.27  5993.92
269  6126.91  20220401  6134.74  5975.66  6006.96
270  6149.08  20220406  6177.77  6106.05  6126.91
271  6134.74  20220407  6171.25  6091.71  6130.83
272  6151.69  20220408  6160.82  6096.93  6147.78
273  6095.62  20220411  6166.03  6072.15  6164.73
274  6184.28  20220412  6228.62  6049.99  6094.32
275  6119.09  20220413  6180.37  6117.79  6173.85
276  6188.20  20220414  6201.24  6132.13  6150.38
277  6173.85  20220415  6199.93  6137.35  6137.35
278  6124.31  20220418  6173.85  6108.66  6173.85
279  6065.63  20220419  6147.78  6042.16  6124.31

I don't care the date is not Monday, so I didn't handle that, the code is:

data['Date'] = pd.to_datetime(data['date'], format="%Y%m%d")
# Refer to: https://www.techtrekking.com/how-to-convert-daily-time-series-data-into-weekly-and-monthly-using-pandas-and-python/
# and here: https://stackoverflow.com/a/60518425/5449346
# and this: https://github.com/pandas-dev/pandas/issues/11217#issuecomment-145253671
logic = {'o'  : 'first',
         'h'  : 'max',
         'l'   : 'min',
         'c' : 'last',
         'Date': 'first',
         }
data = data.groupby([data['Date'].dt.year, data['Date'].dt.week]).agg(logic)
data.set_index('Date', inplace=True)

And the result is, there's no NaN on 2022.01.31 which resample() will produce:

                  l        o        h        c
Date
2021-11-29  6284.68  6355.09  6421.59  6382.47
2021-12-06  6365.52  6372.04  6700.62  6593.70
2021-12-13  6445.06  6593.70  6690.19  6450.28
2021-12-20  6415.07  6437.24  6531.12  6463.31
2021-12-27  6463.31  6473.75  6794.50  6649.77
2022-01-04  6625.00  6649.77  7089.18  7055.27
2022-01-10  6804.93  7055.27  7181.75  6808.84
2022-01-17  6769.73  6776.25  7098.30  6919.67
2022-01-24  6692.80  6906.63  7048.76  6754.08
2022-02-07  6737.13  6811.45  7056.58  7023.98
2022-02-14  6815.36  7073.53  7086.57  6911.85
2022-02-21  6634.12  6880.56  6904.03  6668.02
2022-02-28  6452.88  6669.33  6671.93  6493.30
2022-03-07  5953.50  6463.31  6468.53  6228.62
2022-03-14  5817.90  6154.30  6205.15  6027.82
2022-03-21  5928.72  6030.43  6099.53  5931.33
2022-03-28  5830.93  5871.35  6134.74  6126.91
2022-04-06  6091.71  6126.91  6177.77  6151.69
2022-04-11  6049.99  6164.73  6228.62  6173.85
2022-04-18  6042.16  6173.85  6173.85  6065.63

Upvotes: 1

Neo
Neo

Reputation: 4880

Adding to @Stefan 's answer with recent pandas API as loffset was deprecated since version 1.1.0 and later removed.

df = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
logic = {'Open'  : 'first',
         'High'  : 'max',
         'Low'   : 'min',
         'Close' : 'last',
         'Volume': 'sum'}

dfw = df.resample('W').apply(logic)
# set the index to the beginning of the week
dfw.index = dfw.index - pd.tseries.frequencies.to_offset("6D")

Upvotes: 3

Matthew Scarborough
Matthew Scarborough

Reputation: 131

I had the exact same question and found a great solution here.

https://www.techtrekking.com/how-to-convert-daily-time-series-data-into-weekly-and-monthly-using-pandas-and-python/

The weekly code is posted below.

import pandas as pd
import numpy as np

print('*** Program Started ***')

df = pd.read_csv('15-06-2016-TO-14-06-2018HDFCBANKALLN.csv')

# ensuring only equity series is considered
df = df.loc[df['Series'] == 'EQ']

# Converting date to pandas datetime format
df['Date'] = pd.to_datetime(df['Date'])
# Getting week number
df['Week_Number'] = df['Date'].dt.week
# Getting year. Weeknum is common across years to we need to create unique index by using year and weeknum
df['Year'] = df['Date'].dt.year

# Grouping based on required values
df2 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum'})
# df3 = df.groupby(['Year','Week_Number']).agg({'Open Price':'first', 'High Price':'max', 'Low Price':'min', 'Close Price':'last','Total Traded Quantity':'sum','Average Price':'avg'})
df2.to_csv('Weekly_OHLC.csv')
print('*** Program ended ***')

Upvotes: 12

L. Astola
L. Astola

Reputation: 55

Not a direct answer, but suppose the columns are the dates (transpose of your table), without missing dates.

'''sum up daily results in df to weekly results in wdf'''
wdf = pd.DataFrame(index = df.index)
for i in range(len(df.columns)):
    if (i!=0) & (i%7==0):
        wdf['week'+str(i//7)]= df[df.columns[i-7:i]].sum(axis = 1)

Upvotes: 0

goofd
goofd

Reputation: 2098

In general, assuming that you have the dataframe in the form you specified, you need to do the following steps:

  1. put Date in the index
  2. resample the index.

What you have is a case of applying different functions to different columns. See.

You can resample in various ways. for e.g. you can take the mean of the values or count or so on. check pandas resample.

You can also apply custom aggregators (check the same link). With that in mind, the code snippet for your case can be given as:

f['Date'] = pd.to_datetime(f['Date'])
f.set_index('Date', inplace=True)
f.sort_index(inplace=True)

def take_first(array_like):
    return array_like[0]

def take_last(array_like):
    return array_like[-1]

output = f.resample('W',                                 # Weekly resample
                    how={'Open': take_first, 
                         'High': 'max',
                         'Low': 'min',
                         'Close': take_last,
                         'Volume': 'sum'}, 
                    loffset=pd.offsets.timedelta(days=-6))  # to put the labels to Monday

output = output[['Open', 'High', 'Low', 'Close', 'Volume']]

Here, W signifies a weekly resampling which by default spans from Monday to Sunday. To keep the labels as Monday, loffset is used. There are several predefined day specifiers. Take a look at pandas offsets. You can even define custom offsets (see).

Coming back to the resampling method. Here for Open and Close you can specify custom methods to take the first value or so on and pass the function handle to the how argument.

This answer is based on the assumption that the data seems to be daily, i.e. for each day you have only 1 entry. Also, no data is present for the non-business days. i.e. Sat and Sun. So taking the last data point for the week as the one for Friday is ok. If you so want you can use business week instead of 'W'. Also, for more complex data you may want to use groupby to group the weekly data and then work on the time indices within them.

btw a gist for the solution can be found at: https://gist.github.com/prithwi/339f87bf9c3c37bb3188

Upvotes: 22

Related Questions