Selection of a Series in Python Pandas

Question

I am using Pandas Series to selection rows of a Series. However, I met a problem as following:

>>> q=pandas.Series([0.5,0.5,0,1,0.5,0.5])
>>> q
0    0.5
1    0.5
2    0.0
3    1.0
4    0.5
5    0.5
dtype: float64

>>> (q-0.3).abs()
0    0.2
1    0.2
2    0.3
3    0.7
4    0.2
5    0.2
dtype: float64

>>> (q-0.7).abs()
0    0.2
1    0.2
2    0.7
3    0.3
4    0.2
5    0.2
dtype: float64

>>> (q-0.3).abs() > (q-0.7).abs()          # This is I expected:
0     True                                 # False
1     True                                 # False
2    False                                 # False
3     True                                 # True
4     True                                 # False
5     True                                 # False
dtype: bool

>>> (q-0.3).abs() == (q-0.7).abs()
0    False
1    False
2    False
3    False
4    False
5    False
dtype: bool

Apparently, "0.2" is not greater than "0.2"......

Why is the output different from what I expect?

Andy · Accepted Answer

This is a floating point problem. It is described very well in this question.

To directly answer your problem, look at the first element of your two tests. Your values are not equal.

>>> (q-0.7).abs()[1]
0.19999999999999996
>>> (q-0.3).abs()[1]
0.20000000000000001

We can get your results though, with a little bit of manipulation and by utilizing the decimal module.

>>> from decimal import Decimal, getcontext
>>> import pandas
>>> s = [0.5,0.5,0,1,0.5,0.5]
>>> dec_s = [Decimal(x) for x in s]
>>> q = pandas.Series(dec_s)
>>> q
0    0.5
1    0.5
2      0
3      1
4    0.5
5    0.5
dtype: object
>>> getcontext().prec
28
>>> getcontext().prec = 2
>>> (q-Decimal(0.3)).abs() > (q-Decimal(0.7)).abs()
0    False
1    False
2    False
3     True
4    False
5    False
dtype: bool

A few things to note:

The list of values is converted from float to decimal data types before being added to the Series.
The dtype is now an object instead of float64. This is because numpy doesn't handle Decimal types directly.
The default precision of the decimal type of 28 places after the decimal. I've chopped it to 2. Normally the decimal module can handle this automatically, but with the numpy interaction (I assume), it gets confused and we end up with large float like numbers. The smaller precision matches your data set.
The 0.3 and 0.7 values used in the comparison must also be Decimals, otherwise you will see an error similar to unsupported operand type(s) for +: 'Decimal' and 'float'.

Selection of a Series in Python Pandas

Answers (2)

Related Questions