nipy
nipy

Reputation: 5488

Issues with Pandas Timeseries Resample

I'm using python 3.5.1 and Pandas 0.18.0 and trying to use this notebook to modify financial tick data as the exercises are of interest to me:

I'm having issues with some of the commands and wondered if this is due to the versions of python and pandas?

For example:

This is the file I am reading in with associated output:

data = pd.read_csv('test30dayes2tickforpython.csv',index_col=0,        header=0,parse_dates={"Timestamp" : [0,1]})
data.dtypes
Out[80]:
 Open              float64
 High              float64
 Low               float64
 Last              float64
 Volume              int64
 NumberOfTrades      int64
 BidVolume           int64
 AskVolume           int64
dtype: object

When I try to then create another object like this:

ticks = data.ix[:, ['High','Volume']]
ticks

I get NaN values:

    High    Volume
Timestamp       
2015-12-27 23:00:25.000 NaN NaN
2015-12-27 23:01:11.000 NaN NaN

But if I use the column reference instead of names it works:

ticks = data.ix[:, [1,4]]
ticks


High    Volume
Timestamp       
2015-12-27 23:00:25.000 2045.25 1
2015-12-27 23:01:11.000 2045.50 2

Why is this?

Also, the notebook shows another object created:

bars = ticks.Price.resample('1min', how='ohlc')
bars

When I try this I get this error:

bars = ticks.High.resample('60min', how='ohlc')
bars

1 bars = ticks.High.resample('60min', how='ohlc')
AttributeError: 'DataFrame' object has no attribute 'High'

It works if I don't call the High column:

bars = ticks.resample('60min', how='ohlc')
bars

FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).ohlc()

High    Volume
open    high    low close   open    high    low close
Timestamp                               
2015-12-27 23:00:00 2045.25 2047.75 2045.25 2045.25 1.0 7.0 1.0 5.0

What is the correct command for this please?

I appreciate the notebook is probably not valid for the version of Python/Pandas Im using but as a newbie it is very useful for me so would like to get it working on my data.

Upvotes: 1

Views: 2242

Answers (1)

jezrael
jezrael

Reputation: 862601

There is problem spaces in column names.

print (data.columns)
Index(['Timestamp', ' Open', ' High', ' Low', ' Last', ' Volume',
       ' NumberOfTrades', ' BidVolume', ' AskVolume'],
      dtype='object')

You can strip this spaces:

data.columns = data.columns.str.strip()
print (data.columns)
Index(['Timestamp', 'Open', 'High', 'Low', 'Last', 'Volume', 'NumberOfTrades',
       'BidVolume', 'AskVolume'],
      dtype='object')

ticks = data.ix[:, ['High','Volume']]
print (ticks.head())
      High  Volume
0  2045.25       1
1  2045.50       2
2  2045.50       2
3  2045.50       2
4  2045.50       2

Now you can use:

print (ticks.Price.resample('1min', how='ohlc'))

If you dont want remove spaces, add space to column name:

print (ticks[' Price'].resample('1min', how='ohlc'))

But better is use Resampler.ohlc, if pandas version higher as 0.18.0:

print (ticks.Price.resample('1min').ohlc())

Upvotes: 2

Related Questions