Marco Couto
Marco Couto

Reputation: 337

Python Pandas: dataframe.loc returns "KeyError: label not in [index]", but dataframe.index shows it is

I'm using the pandas toolkit in Python, and I'm have an issue.

I have a list of values, lst, and to make it easy let's say it has only the first 20 natural numbers:

>>> lst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

I then create a DataFrame, by giving it a Series with that list, like this:

>>> df = DataFrame(Series(lst))

And I want to use this to calculate the quantiles from 0.1 (10%) to 1 (100%), and I do it using the quantile function from DataFrame:

>>> quantiles = df.quantile(np.linspace(.1,1,num=10,endpoint=True))

If I print quantiles, this is what appears:

        0
0.1   2.9
0.2   4.8
0.3   6.7
0.4   8.6
0.5  10.5
0.6  12.4
0.7  14.3
0.8  16.2
0.9  18.1
1.0  20.0

Now, I want to store in a variable the value for quantiles 0.3 and 0.7, and after searching for how to do it I came up with a solution using loc in the DataFrame, giving it the quantile label (0.7, for instance) and the column index of the series of values I want to consider. Since there's only one, I do it like this:

>>> q_3 = qts.loc[0.7][0]

The problem is that python gives me this error:

**KeyError: 'the label [0.7] is not in the [index]'**

But I know it exists, since if I try to print the index values, I get this:

>>> qts.index
Float64Index([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], dtype='float64')

So, the index apparently exists, but I says it doesn't. What am I doing wrong?

If I try to print any other quantile value using this approach, rather than 0.3 or 0.7, it works:

>>> qts.loc[0.1][0]
2.8999999999999999
>>> qts.loc[0.2][0]
4.8000000000000007
>>> qts.loc[0.4][0]
8.6000000000000014
>>> qts.loc[0.5][0]
10.5
>>> qts.loc[0.6][0]
12.4
>>> qts.loc[0.8][0]
16.200000000000003
>>> qts.loc[0.9][0]
18.100000000000001
>>> qts.loc[1][0]
20.0

Any thoughts?

I'm using Python 3.5, and pandas 0.20.3.

EDIT Thanks for the feedback! So, it's a float precision issue. Nevertheless, I was wondering: is there a better way to get the N'th element of the list of quantiles, rather than use loc as I did?

Upvotes: 1

Views: 3874

Answers (3)

DeepSpace
DeepSpace

Reputation: 81604

You are a victim of float precision errors (some float values simply can't be represented in a finite binary form, see Is floating point math broken?).

While qts.index indeed outputs
Float64Index([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], dtype='float64'),

see what happens next:

>>>for i in qts.index: 
        print(repr(i))  

0.10000000000000001     
0.20000000000000001     
0.30000000000000004     
0.40000000000000002     
0.5                     
0.59999999999999998     
0.70000000000000007     
0.80000000000000004     
0.90000000000000002     
1.0 

This still doesn't explain why qts.loc[0.4][0] works and qts.loc[0.7][0] doesn't work (one possible explanation might be that .loc does implement some kind of tolerance in case of float indexes ie if the error isn't too large it will "allow" accessing the required index), but qts.loc[0.70000000000000007][0] works:

>>> qts.loc[0.70000000000000007][0]
14.299999999999999

Upvotes: 1

Mr Tarsa
Mr Tarsa

Reputation: 6652

As mentioned by others it's the precision issue. In order to locate desired floating number in the index you may want to use np.isclose

 >> quantiles.loc[np.isclose(quantiles.index, 0.3), 0]
 0.3    6.7              
 Name: 0, dtype: float64
 >> quantiles.loc[np.isclose(quantiles.index, 0.7), 0]
 0.7    14.3
 Name: 0, dtype: float64

Upvotes: 1

lcameron05
lcameron05

Reputation: 774

The index value is not exactly equal to 0.7 here; to a very small precision there is a difference. You can confirm this by running:

assert qts.index[6] == 0.7

or

print(qts.index[6] - 0.7)

If you round the index using numpy.round first you will be able to access the element via qts.loc[0.7, 0] as desired:

import numpy as np

qts.index = np.round(qts.index, decimals=1)

Upvotes: 2

Related Questions