Reputation: 1267
Ok, I have a pandas dataframe like this:
lat long level date time value
3341 29.232 -15.652 10.0 20100109.0 700.0 0.5
3342 27.887 -13.668 120.0 20100109.0 700.0 3.2
...
3899 26.345 -11.234 0.0 20100109.0 700.0 5.8
The reason of the strange number of the index is because it comes from a csv converted to pandas dataframe with some values filtered. Columns level
, date
, time
are not really relevant.
I am trying, in ipython
, to see the some rows filtering by latitude, so I do (if the dataframe is c
):
c[c['lat'] == 26.345]
or
c.loc[c['lat'] == 26.345]
and I can see if the value is present or not, but sometimes it outputs nothing for latitude values that I am seeing in the dataframe !?! (For instance, I can see in the dataframe the value of latitude 27.702 and when I do c[c['lat'] == 27.702]
or c.loc[c['lat'] == 27.702]
I get an empty dataframe and I am seeing the value for such latitude). What is happening here?
Thank you.
Upvotes: 2
Views: 484
Reputation: 22473
This is probably because you are asking for an exact match against floating point values, which is very, very dangerous. They are approximations, often printed to less precision than actually stored.
It's very easy to see 0.735471
printed, say, and think that's all there is, when in fact the value is really 0.73547122072282867
; the display function has simply truncated the result. But when you try a strict equality test on the attractively short value, boom. Doesn't work.
Instead of
c[c['lat'] == 26.345]
Try:
import numpy as np
c[np.isclose(c['lat'], 26.345)]
Now you'll get values that are within a certain range of the value you specified. You can set the tolerance.
Upvotes: 4
Reputation: 2544
In addition to the answers addressing comparison on floating point values, some of the values in your lat
column may be string type instead of numeric.
EDIT: You indicated that this is not the problem, but I'll leave this response here in case it helps someone else. :)
Use the to_numeric() function from pandas to convert them to numeric.
import pandas as pd
df['lat'] = pd.to_numeric(df['lat'])
# you can adjust the errors parameter as you need
df['lat'] = pd.to_numeric(df['lat'], errors='coerce')
Upvotes: 2
Reputation: 8067
It is a bit difficult to give a precise answer, as the question does not contain reproducible example, but let me try. Most probably, this is due floating point issues. It is possible that the number you see (and try to compare with) is not the same number that is stored in the memory due to rounding. For example:
import numpy as np
x = 0.1
arr = np.array([x + x + x])
print(np.array([x + x + x]))
# [ 0.3]
print(arr[arr == 0.3])
# []
print(x + x + x)
# 0.30000000000000004
# in fact 0.1 is not exactly equal to 1/10,
# so 0.1 + 0.1 + 0.1 is not equal to 0.3
You can overcome this issue using np.isclose
instead of ==
:
print(np.isclose(arr, 0.3))
# [ True]
print(arr[np.isclose(arr, 0.3)])
# [ 0.3]
Upvotes: 2