Reputation: 63609
When using pandas' resample
function on a DataFrame in order to convert tick data to OHLCV, a resampling error is encountered.
How should we solve the error?
# Resample data into 30min bins
bars = data.Price.resample('30min', how='ohlc')
volumes = data.Volume.resample('30min', how='sum')
This gives the error:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'
Upvotes: 57
Views: 102455
Reputation: 23041
Because it is designed for time-series data, as the error says, resample()
works only if the index is datetime, timedelta or period. The following are a few common ways this error may show up.
However, you can also use the on=
parameter to use a column as grouper, without having a datetime index.
df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')
bars = df.resample('30min', on='Timestamp')['Price'].ohlc()
volumes = df.resample('30min', on='Timestamp')['Volume'].sum()
If you have a MultiIndex dataframe where one of the index is datetime, then you can use level=
to select that level as the grouper.
volumes = df.resample('30min', level='Timestamp')['Volume'].sum()
You can also use resample.agg
to pass multiple methods.
resampled = df.resample('30min', on='Timestamp').agg({'Price': 'ohlc', 'Volume': 'sum'})
Upvotes: 2
Reputation: 879163
Convert the integer timestamps in the index to a DatetimeIndex:
data.index = pd.to_datetime(data.index, unit='s')
This interprets the integers as seconds since the Epoch.
For example, given
data = pd.DataFrame(
{'Timestamp':[1313331280, 1313334917, 1313334917, 1313340309, 1313340309],
'Price': [10.4]*3 + [10.5]*2, 'Volume': [0.779, 0.101, 0.316, 0.150, 1.8]})
data = data.set_index(['Timestamp'])
# Price Volume
# Timestamp
# 1313331280 10.4 0.779
# 1313334917 10.4 0.101
# 1313334917 10.4 0.316
# 1313340309 10.5 0.150
# 1313340309 10.5 1.800
data.index = pd.to_datetime(data.index, unit='s')
yields
Price Volume
2011-08-14 14:14:40 10.4 0.779
2011-08-14 15:15:17 10.4 0.101
2011-08-14 15:15:17 10.4 0.316
2011-08-14 16:45:09 10.5 0.150
2011-08-14 16:45:09 10.5 1.800
Then
ticks = data.ix[:, ['Price', 'Volume']]
bars = ticks.Price.resample('30min').ohlc()
volumes = ticks.Volume.resample('30min').sum()
can be computed:
In [368]: bars
Out[368]:
open high low close
2011-08-14 14:00:00 10.4 10.4 10.4 10.4
2011-08-14 14:30:00 NaN NaN NaN NaN
2011-08-14 15:00:00 10.4 10.4 10.4 10.4
2011-08-14 15:30:00 NaN NaN NaN NaN
2011-08-14 16:00:00 NaN NaN NaN NaN
2011-08-14 16:30:00 10.5 10.5 10.5 10.5
In [369]: volumes
Out[369]:
2011-08-14 14:00:00 0.779
2011-08-14 14:30:00 NaN
2011-08-14 15:00:00 0.417
2011-08-14 15:30:00 NaN
2011-08-14 16:00:00 NaN
2011-08-14 16:30:00 1.950
Freq: 30T, Name: Volume, dtype: float64
Upvotes: 74