Thomas
Thomas

Reputation: 12107

Resample OHLC data with pandas

There are a lot of similar questions, all of them with they specific issues and answers, but I haven't found a fitting solution, nor an understanding on how to do it.

I have typical data:

date        open    high    low     close   volume      spot
1507842000  5313.3  5345.6  5272    5295.1  22612561    5301.462201
1507845600  5295.1  5326.7  5286.1  5301.1  12127159    5308.487754
1507849200  5301.1  5467.5  5301.1  5464.5  54568881    5401.331605
1507852800  5464.7  5497    5394.9  5402.5  58411322    5446.552171
1507856400  5402.1  5542    5402.1  5541.2  50272286    5466.652636
1507860000  5540.4  5980    5440.1  5694.5  182746217   5717.856124
1507863600  5689.8  5800    5604.5  5739.6  78341266    5709.488508
1507867200  5742    5897    5713.1  5753.2  79738461    5794.402674
1507870800  5753.1  5798.9  5520.3  5574.5  87621428    5640.727381
1507874400  5574.6  5672.6  5503.2  5608.4  56964404    5591.237093
1507878000  5607.5  5689.1  5570    5660    46132190    5640.761482
1507881600  5660    5743    5634.8  5652    50173714    5690.219952

but not just OHLC, but also volume and spot price.

I am trying to resample hours to days.

so, I load the csv:

data_hourly = pd.read_csv('../data/hourly.csv', parse_dates=True, date_parser=date_parse, index_col=0, header=0)

(the date_parse function is removing the minutes / seconds)

I tried:

data_daily = data_hourly.resample('1D').ohlc()

and, this clearly doesn't work at all; giving me rows with a large amount of columns.

and I tried:

columns_dict = {'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'spot': 'average'}

data_daily = data_hourly.resample('1D', how=columns_dict)

but this crashes with an error:

"%r object has no attribute %r" % (type(self).name, attr) AttributeError: 'SeriesGroupBy' object has no attribute 'average'

besides, it tells me the 'how' field is deprecated anyways, but I didn't see a sample to do it the 'new' way.

Upvotes: 2

Views: 2389

Answers (1)

jezrael
jezrael

Reputation: 862801

You are close, need mean instead average and pass it to Resampler.agg:

columns_dict = {'open': 'first', 'high': 'max', 'low': 'min', 
               'close': 'last', 'volume': 'sum', 'spot': 'mean'}
data_daily = data_hourly.resample('1D').agg(columns_dict)
print (data_daily)
              open    high     low   close     volume         spot
date                                                              
2017-10-12  5313.3  5467.5  5272.0  5464.5   89308601  5337.093853
2017-10-13  5464.7  5980.0  5394.9  5652.0  690401288  5633.099780

Upvotes: 4

Related Questions