Reputation: 1588
I am having the following problem:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(401), index=np.linspace(0, 1, 401))
print(np.linspace(0, 1, 401))
We see that 0.47
is in there:
[ 0. 0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0175 0.02
0.0225 0.025 0.0275 0.03 0.0325 0.035 0.0375 0.04 0.0425
0.045 0.0475 0.05 0.0525 0.055 0.0575 0.06 0.0625 0.065
0.0675 0.07 0.0725 0.075 0.0775 0.08 0.0825 0.085 0.0875
0.09 0.0925 0.095 0.0975 0.1 0.1025 0.105 0.1075 0.11
0.1125 0.115 0.1175 0.12 0.1225 0.125 0.1275 0.13 0.1325
0.135 0.1375 0.14 0.1425 0.145 0.1475 0.15 0.1525 0.155
0.1575 0.16 0.1625 0.165 0.1675 0.17 0.1725 0.175 0.1775
0.18 0.1825 0.185 0.1875 0.19 0.1925 0.195 0.1975 0.2
0.2025 0.205 0.2075 0.21 0.2125 0.215 0.2175 0.22 0.2225
0.225 0.2275 0.23 0.2325 0.235 0.2375 0.24 0.2425 0.245
0.2475 0.25 0.2525 0.255 0.2575 0.26 0.2625 0.265 0.2675
0.27 0.2725 0.275 0.2775 0.28 0.2825 0.285 0.2875 0.29
0.2925 0.295 0.2975 0.3 0.3025 0.305 0.3075 0.31 0.3125
0.315 0.3175 0.32 0.3225 0.325 0.3275 0.33 0.3325 0.335
0.3375 0.34 0.3425 0.345 0.3475 0.35 0.3525 0.355 0.3575
0.36 0.3625 0.365 0.3675 0.37 0.3725 0.375 0.3775 0.38
0.3825 0.385 0.3875 0.39 0.3925 0.395 0.3975 0.4 0.4025
0.405 0.4075 0.41 0.4125 0.415 0.4175 0.42 0.4225 0.425
0.4275 0.43 0.4325 0.435 0.4375 0.44 0.4425 0.445 0.4475
0.45 0.4525 0.455 0.4575 0.46 0.4625 0.465 0.4675 0.47
0.4725 0.475 0.4775 0.48 0.4825 0.485 0.4875 0.49 0.4925
0.495 0.4975 0.5 0.5025 0.505 0.5075 0.51 0.5125 0.515
0.5175 0.52 0.5225 0.525 0.5275 0.53 0.5325 0.535 0.5375
0.54 0.5425 0.545 0.5475 0.55 0.5525 0.555 0.5575 0.56
0.5625 0.565 0.5675 0.57 0.5725 0.575 0.5775 0.58 0.5825
0.585 0.5875 0.59 0.5925 0.595 0.5975 0.6 0.6025 0.605
0.6075 0.61 0.6125 0.615 0.6175 0.62 0.6225 0.625 0.6275
0.63 0.6325 0.635 0.6375 0.64 0.6425 0.645 0.6475 0.65
0.6525 0.655 0.6575 0.66 0.6625 0.665 0.6675 0.67 0.6725
0.675 0.6775 0.68 0.6825 0.685 0.6875 0.69 0.6925 0.695
0.6975 0.7 0.7025 0.705 0.7075 0.71 0.7125 0.715 0.7175
0.72 0.7225 0.725 0.7275 0.73 0.7325 0.735 0.7375 0.74
0.7425 0.745 0.7475 0.75 0.7525 0.755 0.7575 0.76 0.7625
0.765 0.7675 0.77 0.7725 0.775 0.7775 0.78 0.7825 0.785
0.7875 0.79 0.7925 0.795 0.7975 0.8 0.8025 0.805 0.8075
0.81 0.8125 0.815 0.8175 0.82 0.8225 0.825 0.8275 0.83
0.8325 0.835 0.8375 0.84 0.8425 0.845 0.8475 0.85 0.8525
0.855 0.8575 0.86 0.8625 0.865 0.8675 0.87 0.8725 0.875
0.8775 0.88 0.8825 0.885 0.8875 0.89 0.8925 0.895 0.8975
0.9 0.9025 0.905 0.9075 0.91 0.9125 0.915 0.9175 0.92
0.9225 0.925 0.9275 0.93 0.9325 0.935 0.9375 0.94 0.9425
0.945 0.9475 0.95 0.9525 0.955 0.9575 0.96 0.9625 0.965
0.9675 0.97 0.9725 0.975 0.9775 0.98 0.9825 0.985 0.9875
0.99 0.9925 0.995 0.9975 1. ]
Now for example I try df[0.47]
and get the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2133 try:
-> 2134 return self._engine.get_loc(key)
2135 except KeyError:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()
pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()
KeyError: 0.47
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-117-76c97f917184> in <module>()
----> 1 df[0.47]
/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
2057 return self._getitem_multilevel(key)
2058 else:
-> 2059 return self._getitem_column(key)
2060
2061 def _getitem_column(self, key):
/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key)
2064 # get column
2065 if self.columns.is_unique:
-> 2066 return self._get_item_cache(key)
2067
2068 # duplicate columns & possible reduce dimensionality
/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1384 res = cache.get(item)
1385 if res is None:
-> 1386 values = self._data.get(item)
1387 res = self._box_item_values(item, values)
1388 cache[item] = res
/opt/conda/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath)
3541
3542 if not isnull(item):
-> 3543 loc = self.items.get_loc(item)
3544 else:
3545 indexer = np.arange(len(self.items))[isnull(self.items)]
/opt/conda/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2134 return self._engine.get_loc(key)
2135 except KeyError:
-> 2136 return self._engine.get_loc(self._maybe_cast_indexer(key))
2137
2138 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4238)()
pandas/index.pyx in pandas.index.Int64Engine._check_type (pandas/index.c:8209)()
KeyError: 0.47
I don't understand why this happens.
Upvotes: 2
Views: 2673
Reputation: 394459
The issue here is due to float imprecision, you can use the method get_slice_bound
to return you the ordinal position for that row:
In [237]:
df.iloc[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[237]:
0 0.854001
Name: 0.47, dtype: float64
We can see the real value of that index label:
In [238]:
df.index[df.index.get_slice_bound(0.47, side='left', kind='loc')]
Out[238]:
0.47000000000000003
Whilst pandas does support float64Index
it's going to be problematic for exact label lookup by doing this, you'd be better off sticking with the default Int64Index
get_slice_bound
is an undocumented method but the docstring gives you enough info:
Signature: df.index.get_slice_bound(label, side, kind) Docstring: Calculate slice bound that corresponds to given label.
Returns leftmost (one-past-the-rightmost if ``side=='right'``) position of given label.
Parameters
---------- label : object side : {'left', 'right'} kind : {'ix', 'loc', 'getitem'}
You can also use get_loc
and pass method='nearest'
to achieve the same:
In [240]:
df.iloc[df.index.get_loc(0.47, method='nearest')]
Out[240]:
0 0.854001
Name: 0.47, dtype: float64
Upvotes: 4
Reputation: 140307
The representation may be the same but the value may be slightly different and the hash
is then different.
The values may be different and the display is still 0.47
for both, which is misleading.
=> You cannot index your elements by a float key reliably.
Instead, maybe use decimals as keys, or rounded values.
Upvotes: 2