phaebz
phaebz

Reputation: 423

Pandas MultiIndex names not working

The axis 0 in the IndexError strikes me as odd. Where is my mistake?

It works if I do not rename the columns before setting the MultiIndex (uncomment line df = df.set_index([0, 1]) and comment the three above). Tested with stable and dev versions.

I am fairly new to python and pandas so any other suggestions for improvement are much appreciated.

import itertools
import datetime as dt

import numpy as np
import pandas as pd
from pandas.io.html import read_html


dfs = read_html('http://www.epexspot.com/en/market-data/auction/auction-table/2006-01-01/DE',
                attrs={'class': 'list hours responsive'},
                skiprows=1)

df = dfs[0]

hours = list(itertools.chain.from_iterable([[x, x] for x in range(1, 25)]))
df[0] = hours

df = df.rename(columns={0: 'a'})
df = df.rename(columns={1: 'b'})
df = df.set_index(['a', 'b'])
#df = df.set_index([0, 1])

today = dt.datetime(2006, 1, 1)
days = pd.date_range(today, periods=len(df.columns), freq='D')

colnames = [day.strftime(format='%Y-%m-%d') for day in days]
df.columns = colnames


Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/frame.py", line 2099, in __setattr__
    super(DataFrame, self).__setattr__(name, value)
  File "properties.pyx", line 59, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:29330)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/generic.py", line 656, in _set_axis
    self._data.set_axis(axis, labels)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 1039, in set_axis
    block.set_ref_items(self.items, maybe_rename=maybe_rename)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/internals.py", line 93, in set_ref_items
    self.items = ref_items.take(self.ref_locs)
  File "/Users/user/Optional/pandas_stable_env/lib/python3.3/site-packages/pandas/core/index.py", line 395, in take
    taken = self.view(np.ndarray).take(indexer)
IndexError: index 7 is out of bounds for axis 0 with size 7

Upvotes: 1

Views: 1377

Answers (1)

Jeff
Jeff

Reputation: 129008

This is a very subtle bug. Going to be fixed by: https://github.com/pydata/pandas/pull/5345 in upcoming release 0.13 (very shortly).

As a workaround, you can do this after then set_index but before the column assignment

df = DataFrame(dict([ (c,col) for c, col in df.iteritems() ]))

The internal state of the frame was off; it is the renames followed by the set_index which caused this, so this recreates it so you can work with it.

Upvotes: 1

Related Questions