Reputation: 10322
I'm battling some floating point problems in Pandas read_csv function. In my investigation, I found this:
In [15]: a = 5.9975
In [16]: a
Out[16]: 5.9975
In [17]: np.float64(a)
Out[17]: 5.9974999999999996
Why is builtin float
of Python and the np.float64
type from Python giving different results? I thought they were both C++ doubles?
Upvotes: 67
Views: 150748
Reputation: 23041
Numpy float64 dtype inherits from Python float, which implements C double internally. You can verify that as follows:
isinstance(np.float64(5.9975), float) # True
So even if their string representation is different, the values they store are the same.
On the other hand, np.float32
implements C float (which has no analog in pure Python) and no numpy int dtype (np.int32
, np.int64
etc.) inherits from Python int because in Python 3 int is unbounded:
isinstance(np.float32(5.9975), float) # False
isinstance(np.int32(1), int) # False
np.float64
at all?np.float64
defines most of the attributes and methods in np.ndarray
. From the following code, you can see that np.float64
implements all but 4 methods of np.array
:
[m for m in set(dir(np.array([]))) - set(dir(np.float64())) if not m.startswith("_")]
# ['argpartition', 'ctypes', 'partition', 'dot']
So if you have a function that expects to use ndarray methods, you can pass np.float64
to it while float
doesn't give you the same.
For example:
def my_cool_function(x):
return x.sum()
my_cool_function(np.array([1.5, 2])) # <--- OK
my_cool_function(np.float64(5.9975)) # <--- OK
my_cool_function(5.9975) # <--- AttributeError
Upvotes: 1
Reputation: 798526
>>> numpy.float64(5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
>>> (5.9975).hex()
'0x1.7fd70a3d70a3dp+2'
They are the same number. What differs is their representation; the Python native type uses a "sane" representation, and the NumPy type uses an accurate representation.
Upvotes: 67