Mike
Mike

Reputation: 7203

large index values causing pandas KeyError

I setup a dataframe with UInt64Index like so

df = pandas.DataFrame([[1,2,3],[4,5,9223943912072220999],[7,8,9]], columns=['a','b','c'])
df = df.set_index('c')
>>> df
                     a  b
c              
3                    1  2
9223943912072220999  4  5
9                    7  8

>>> df.index
UInt64Index([3, 9223943912072220999, 9], dtype='uint64', name=u'c')

Now trying to access elements by index values works for the smaller values

>>> df.index[0]
3
>>> df.loc[3]
a    1
b    2
Name: 3, dtype: int64

But trying to do the same thing for the big value causes an error

>>> df.index[1]
9223943912072220999
>>> df.loc[9223943912072220999]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/u1/mprager/.virtualenvs/jupyter/local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1373, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/u1/mprager/.virtualenvs/jupyter/local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1626, in _getitem_axis
    self._has_valid_type(key, axis)
  File "/home/u1/mprager/.virtualenvs/jupyter/local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1514, in _has_valid_type
    error()
  File "/home/u1/mprager/.virtualenvs/jupyter/local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1501, in error
    axis=self.obj._get_axis_name(axis)))
KeyError: u'the label [9223943912072220999] is not in the [index]'

I thought it might be some kind of dtype issue but even if I do df.loc[df.index[1]] I get the same error.

This is using pandas 0.22.0 on python 2.7.9

Upvotes: 3

Views: 1498

Answers (1)

cs95
cs95

Reputation: 402603

This could be a bug. 9223943912072220999 seems to be too large to fit into a standard C signed long variable, and this is also causing problems with loc. One alternative would be to use df.index.get_loc, get the index, and then use iloc as the indexer for position based indexing.

i = df.index.get_loc(9223943912072220999)
df.iloc[i]

a    4
b    5
Name: 9223943912072220999, dtype: int64

Another alternative would be to deal with the index as an object array -

df.index = df.index.astype(object)

This allows you to work with arbitrarily large numbers (basically, anything that you can hash can now sit inside an object index) -

df.loc[9223943912072220999]

a    4
b    5
Name: 9223943912072220999, dtype: int64

Note that, as far as alternatives go, this is one of the worse ones, and likely less performant.

Upvotes: 4

Related Questions