jespern
jespern

Reputation: 201

Resampling trade data into OHLCV with pandas

I have historical trade data in a pandas DataFrame, containing price and volume columns, indexed by a DateTimeIndex.

For example:

>>> print df.tail()
                             price  volume
2014-01-15 14:29:54+00:00  949.975    0.01
2014-01-15 14:29:59+00:00  941.370    0.01
2014-01-15 14:30:17+00:00  949.975    0.01
2014-01-15 14:30:24+00:00  941.370    0.01
2014-01-15 14:30:36+00:00  949.975    0.01

Now, I can resample this into OHLC data using df.resample(freq, how={'price': 'ohlc'}), which is fine, but I'd also like to include the volume.

When I try df.resample(freq, how={'price': 'ohlc', 'volume': 'sum'}), I get:

ValueError: Shape of passed values is (2,), indices imply (2, 95)

I'm not quite sure what is wrong with my dataset, or why this fails. Could anyone help shed some light on this? Much appreciated.

Upvotes: 17

Views: 14269

Answers (3)

Roman Skrypin
Roman Skrypin

Reputation: 21

price = df['price'].resample('30s').ohlc(_method='ohlc')
volume = df['volume'].resample('30s').sum()
data = pd.concat([price, volume], axis=1)
                        open     high      low    close  volume
2014-01-15 14:29:30  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:00  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:30  949.975  949.975  949.975  949.975    0.01

Upvotes: 2

campervancoder
campervancoder

Reputation: 1629

You can now do this in later versions of Pandas Example: Pandas version 0.22.00 df.resample('30S').mean()

Upvotes: 3

TomAugspurger
TomAugspurger

Reputation: 28946

The problem isn't with the resampling, it's from trying to concat a MultiIndex (from the price OHLC), with a regular index (for the Volume sum).

In [17]: df
Out[17]: 
                       price  volume
2014-01-15 14:29:54  949.975    0.01
2014-01-15 14:29:59  941.370    0.01
2014-01-15 14:30:17  949.975    0.01
2014-01-15 14:30:24  941.370    0.01
2014-01-15 14:30:36  949.975    0.01

[5 rows x 2 columns]

In [18]: df.resample('30s', how={'price': 'ohlc'})  # Note the MultiIndex
Out[18]: 
                       price                           
                        open     high      low    close
2014-01-15 14:29:30  949.975  949.975  941.370  941.370
2014-01-15 14:30:00  949.975  949.975  941.370  941.370
2014-01-15 14:30:30  949.975  949.975  949.975  949.975

[3 rows x 4 columns]

In [19]: df.resample('30s', how={'volume': 'sum'})  # Regular Index for columns
Out[19]: 
                     volume
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

I guess you could manually create a MultiIndex for (volume, sum) and then concat:

In [34]: vol = df.resample('30s', how={'volume': 'sum'})

In [35]: vol.columns = pd.MultiIndex.from_tuples([('volume', 'sum')])

In [36]: vol
Out[36]: 
                     volume
                        sum
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

In [37]: price = df.resample('30s', how={'price': 'ohlc'})

In [38]: pd.concat([price, vol], axis=1)
Out[38]: 
                       price                             volume
                        open     high      low    close     sum
2014-01-15 14:29:30  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:00  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:30  949.975  949.975  949.975  949.975    0.01

[3 rows x 5 columns]

But it might be better if resample could handle this automatically.

Upvotes: 15

Related Questions