Reputation: 337
I'm using the pandas toolkit in Python, and I'm have an issue.
I have a list of values, lst
, and to make it easy let's say it has only the first 20 natural numbers:
>>> lst = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
I then create a DataFrame
, by giving it a Series
with that list, like this:
>>> df = DataFrame(Series(lst))
And I want to use this to calculate the quantiles from 0.1 (10%) to 1 (100%), and I do it using the quantile
function from DataFrame:
>>> quantiles = df.quantile(np.linspace(.1,1,num=10,endpoint=True))
If I print quantiles
, this is what appears:
0
0.1 2.9
0.2 4.8
0.3 6.7
0.4 8.6
0.5 10.5
0.6 12.4
0.7 14.3
0.8 16.2
0.9 18.1
1.0 20.0
Now, I want to store in a variable the value for quantiles 0.3 and 0.7, and after searching for how to do it I came up with a solution using loc
in the DataFrame
, giving it the quantile label (0.7
, for instance) and the column index of the series of values I want to consider. Since there's only one, I do it like this:
>>> q_3 = qts.loc[0.7][0]
The problem is that python gives me this error:
**KeyError: 'the label [0.7] is not in the [index]'**
But I know it exists, since if I try to print the index
values, I get this:
>>> qts.index
Float64Index([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], dtype='float64')
So, the index apparently exists, but I says it doesn't. What am I doing wrong?
If I try to print any other quantile value using this approach, rather than 0.3
or 0.7
, it works:
>>> qts.loc[0.1][0]
2.8999999999999999
>>> qts.loc[0.2][0]
4.8000000000000007
>>> qts.loc[0.4][0]
8.6000000000000014
>>> qts.loc[0.5][0]
10.5
>>> qts.loc[0.6][0]
12.4
>>> qts.loc[0.8][0]
16.200000000000003
>>> qts.loc[0.9][0]
18.100000000000001
>>> qts.loc[1][0]
20.0
Any thoughts?
I'm using Python 3.5, and pandas 0.20.3.
EDIT
Thanks for the feedback!
So, it's a float precision issue. Nevertheless, I was wondering: is there a better way to get the N'th element of the list of quantiles, rather than use loc
as I did?
Upvotes: 1
Views: 3874
Reputation: 81604
You are a victim of float precision errors (some float values simply can't be represented in a finite binary form, see Is floating point math broken?).
While qts.index
indeed outputs
Float64Index([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0], dtype='float64')
,
see what happens next:
>>>for i in qts.index:
print(repr(i))
0.10000000000000001
0.20000000000000001
0.30000000000000004
0.40000000000000002
0.5
0.59999999999999998
0.70000000000000007
0.80000000000000004
0.90000000000000002
1.0
This still doesn't explain why qts.loc[0.4][0]
works and qts.loc[0.7][0]
doesn't work (one possible explanation might be that .loc
does implement some kind of tolerance in case of float indexes ie if the error isn't too large it will "allow" accessing the required index), but qts.loc[0.70000000000000007][0]
works:
>>> qts.loc[0.70000000000000007][0]
14.299999999999999
Upvotes: 1
Reputation: 6652
As mentioned by others it's the precision issue. In order to locate desired floating number in the index you may want to use np.isclose
>> quantiles.loc[np.isclose(quantiles.index, 0.3), 0]
0.3 6.7
Name: 0, dtype: float64
>> quantiles.loc[np.isclose(quantiles.index, 0.7), 0]
0.7 14.3
Name: 0, dtype: float64
Upvotes: 1
Reputation: 774
The index value is not exactly equal to 0.7 here; to a very small precision there is a difference. You can confirm this by running:
assert qts.index[6] == 0.7
or
print(qts.index[6] - 0.7)
If you round the index using numpy.round
first you will be able to access the element via qts.loc[0.7, 0]
as desired:
import numpy as np
qts.index = np.round(qts.index, decimals=1)
Upvotes: 2