johnbaltis
johnbaltis

Reputation: 1588

pandas KeyError, can't find index when using floats

I am having the following problem:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(401), index=np.linspace(0, 1, 401))
print(np.linspace(0, 1, 401))

We see that 0.47 is in there:

[ 0.      0.0025  0.005   0.0075  0.01    0.0125  0.015   0.0175  0.02
  0.0225  0.025   0.0275  0.03    0.0325  0.035   0.0375  0.04    0.0425
  0.045   0.0475  0.05    0.0525  0.055   0.0575  0.06    0.0625  0.065
  0.0675  0.07    0.0725  0.075   0.0775  0.08    0.0825  0.085   0.0875
  0.09    0.0925  0.095   0.0975  0.1     0.1025  0.105   0.1075  0.11
  0.1125  0.115   0.1175  0.12    0.1225  0.125   0.1275  0.13    0.1325
  0.135   0.1375  0.14    0.1425  0.145   0.1475  0.15    0.1525  0.155
  0.1575  0.16    0.1625  0.165   0.1675  0.17    0.1725  0.175   0.1775
  0.18    0.1825  0.185   0.1875  0.19    0.1925  0.195   0.1975  0.2
  0.2025  0.205   0.2075  0.21    0.2125  0.215   0.2175  0.22    0.2225
  0.225   0.2275  0.23    0.2325  0.235   0.2375  0.24    0.2425  0.245
  0.2475  0.25    0.2525  0.255   0.2575  0.26    0.2625  0.265   0.2675
  0.27    0.2725  0.275   0.2775  0.28    0.2825  0.285   0.2875  0.29
  0.2925  0.295   0.2975  0.3     0.3025  0.305   0.3075  0.31    0.3125
  0.315   0.3175  0.32    0.3225  0.325   0.3275  0.33    0.3325  0.335
  0.3375  0.34    0.3425  0.345   0.3475  0.35    0.3525  0.355   0.3575
  0.36    0.3625  0.365   0.3675  0.37    0.3725  0.375   0.3775  0.38
  0.3825  0.385   0.3875  0.39    0.3925  0.395   0.3975  0.4     0.4025
  0.405   0.4075  0.41    0.4125  0.415   0.4175  0.42    0.4225  0.425
  0.4275  0.43    0.4325  0.435   0.4375  0.44    0.4425  0.445   0.4475
  0.45    0.4525  0.455   0.4575  0.46    0.4625  0.465   0.4675  0.47
  0.4725  0.475   0.4775  0.48    0.4825  0.485   0.4875  0.49    0.4925
  0.495   0.4975  0.5     0.5025  0.505   0.5075  0.51    0.5125  0.515
  0.5175  0.52    0.5225  0.525   0.5275  0.53    0.5325  0.535   0.5375
  0.54    0.5425  0.545   0.5475  0.55    0.5525  0.555   0.5575  0.56
  0.5625  0.565   0.5675  0.57    0.5725  0.575   0.5775  0.58    0.5825
  0.585   0.5875  0.59    0.5925  0.595   0.5975  0.6     0.6025  0.605
  0.6075  0.61    0.6125  0.615   0.6175  0.62    0.6225  0.625   0.6275
  0.63    0.6325  0.635   0.6375  0.64    0.6425  0.645   0.6475  0.65
  0.6525  0.655   0.6575  0.66    0.6625  0.665   0.6675  0.67    0.6725
  0.675   0.6775  0.68    0.6825  0.685   0.6875  0.69    0.6925  0.695
  0.6975  0.7     0.7025  0.705   0.7075  0.71    0.7125  0.715   0.7175
  0.72    0.7225  0.725   0.7275  0.73    0.7325  0.735   0.7375  0.74
  0.7425  0.745   0.7475  0.75    0.7525  0.755   0.7575  0.76    0.7625
  0.765   0.7675  0.77    0.7725  0.775   0.7775  0.78    0.7825  0.785
  0.7875  0.79    0.7925  0.795   0.7975  0.8     0.8025  0.805   0.8075
  0.81    0.8125  0.815   0.8175  0.82    0.8225  0.825   0.8275  0.83
  0.8325  0.835   0.8375  0.84    0.8425  0.845   0.8475  0.85    0.8525
  0.855   0.8575  0.86    0.8625  0.865   0.8675  0.87    0.8725  0.875
  0.8775  0.88    0.8825  0.885   0.8875  0.89    0.8925  0.895   0.8975
  0.9     0.9025  0.905   0.9075  0.91    0.9125  0.915   0.9175  0.92
  0.9225  0.925   0.9275  0.93    0.9325  0.935   0.9375  0.94    0.9425
  0.945   0.9475  0.95    0.9525  0.955   0.9575  0.96    0.9625  0.965
  0.9675  0.97    0.9725  0.975   0.9775  0.98    0.9825  0.985   0.9875
  0.99    0.9925  0.995   0.9975  1.    ]

Now for example I try df[0.47] and get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2133             try:
-> 2134                 return self._engine.get_loc(key)
   2135             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-117-76c97f917184> in <module>()
----> 1 df[0.47]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()

pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()

KeyError: 0.47

I don't understand why this happens.

Upvotes: 2

Views: 2673

Answers (2)

EdChum
EdChum

Reputation: 394459

The issue here is due to float imprecision, you can use the method get_slice_bound to return you the ordinal position for that row:

In [237]:
df.iloc[df.index.get_slice_bound(0.47, side='left', kind='loc')]

Out[237]:
0    0.854001
Name: 0.47, dtype: float64

We can see the real value of that index label:

In [238]:
df.index[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[238]:
0.47000000000000003

Whilst pandas does support float64Index it's going to be problematic for exact label lookup by doing this, you'd be better off sticking with the default Int64Index

get_slice_bound is an undocumented method but the docstring gives you enough info:

Signature: df.index.get_slice_bound(label, side, kind) Docstring: Calculate slice bound that corresponds to given label.

Returns leftmost (one-past-the-rightmost if ``side=='right'``) position of given label.

Parameters
---------- label : object side : {'left', 'right'} kind : {'ix', 'loc', 'getitem'}

You can also use get_loc and pass method='nearest' to achieve the same:

In [240]:
df.iloc[df.index.get_loc(0.47, method='nearest')]

Out[240]:
0    0.854001
Name: 0.47, dtype: float64

Upvotes: 4

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140307

The representation may be the same but the value may be slightly different and the hash is then different.

The values may be different and the display is still 0.47 for both, which is misleading.

=> You cannot index your elements by a float key reliably.

Instead, maybe use decimals as keys, or rounded values.

Upvotes: 2

Related Questions