Reputation: 423
The axis 0
in the IndexError
strikes me as odd. Where is my mistake?
It works if I do not rename the columns before setting the MultiIndex (uncomment line df = df.set_index([0, 1])
and comment the three above). Tested with stable and dev versions.
I am fairly new to python and pandas so any other suggestions for improvement are much appreciated.
import itertools
import datetime as dt
import numpy as np
import pandas as pd
from pandas.io.html import read_html
dfs = read_html('http://www.epexspot.com/en/market-data/auction/auction-table/2006-01-01/DE',
attrs={'class': 'list hours responsive'},
skiprows=1)
df = dfs[0]
hours = list(itertools.chain.from_iterable([[x, x] for x in range(1, 25)]))
df[0] = hours
df = df.rename(columns={0: 'a'})
df = df.rename(columns={1: 'b'})
df = df.set_index(['a', 'b'])
#df = df.set_index([0, 1])
today = dt.datetime(2006, 1, 1)
days = pd.date_range(today, periods=len(df.columns), freq='D')
colnames = [day.strftime(format='%Y-%m-%d') for day in days]
df.columns = colnames
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/frame.py", line 2099, in __setattr__
super(DataFrame, self).__setattr__(name, value)
File "properties.pyx", line 59, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:29330)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/generic.py", line 656, in _set_axis
self._data.set_axis(axis, labels)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 1039, in set_axis
block.set_ref_items(self.items, maybe_rename=maybe_rename)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 93, in set_ref_items
self.items = ref_items.take(self.ref_locs)
File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/index.py", line 395, in take
taken = self.view(np.ndarray).take(indexer)
IndexError: index 7 is out of bounds for axis 0 with size 7
Upvotes: 1
Views: 1377
Reputation: 129008
This is a very subtle bug. Going to be fixed by: https://github.com/pydata/pandas/pull/5345 in upcoming release 0.13 (very shortly).
As a workaround, you can do this after then set_index
but before the column assignment
df = DataFrame(dict([ (c,col) for c, col in df.iteritems() ]))
The internal state of the frame was off; it is the renames followed by the set_index which caused this, so this recreates it so you can work with it.
Upvotes: 1