oz123
oz123

Reputation: 28848

dtype usage when converting list to numpy array

I am quite confused by dtype when creating numpy array. I am creating them from a list of floats. First let me note that is not an issue of printing, becuase I already did: np.set_printoptions(precision=18).

This is a part of my list:

In [37]: boundary
Out[37]: 
[['3366307.654296875', '5814192.595703125'],
 ['3366372.2244873046875', '5814350.752685546875'],
 ['3366593.37969970703125', '5814844.73492431640625'],
 ['3367585.4779052734375', '5814429.293701171875'],
 ['3367680.55389404296875', '5814346.618896484375'],
 ....
 [ 3366307.654296875     ,  5814192.595703125     ]]

Then I convert it to a numpy array:

In [43]: boundary2=np.asarray(boundary, dtype=float)   
In [44]: boundary2
Out[44]: 
array([[ 3366307.654296875     ,  5814192.595703125     ],
       [ 3366372.2244873046875 ,  5814350.752685546875  ],
       [ 3366593.37969970703125,  5814844.73492431640625],
        ....
       [ 3366307.654296875     ,  5814192.595703125     ]])
# the full number of significant digits is preserved. 
# this also works with:
In [45]: boundary2=np.array(boundary, dtype=float)

In [46]: boundary2
Out[46]: 
array([[ 3366307.654296875     ,  5814192.595703125     ],
     [ 3366372.2244873046875 ,  5814350.752685546875  ],
     [ 3366593.37969970703125,  5814844.73492431640625],
     ...
     [ 3366307.654296875     ,  5814192.595703125     ]])

# This also works with dtype=np.float
In [56]: boundary3=np.array(boundary, dtype=np.float)
In [57]: boundary3
Out[57]: 
array([[ 3366307.654296875     ,  5814192.595703125     ],
       [ 3366372.2244873046875 ,  5814350.752685546875  ],
       [ 3366593.37969970703125,  5814844.73492431640625],
       ....
       [ 3366307.654296875     ,  5814192.595703125     ]])

Here is why I am confused, if I used dtype=np.float32 it seems like I loosing significant digits:

In [58]: boundary4=np.array(boundary, dtype=np.float32)   
In [59]: boundary4
Out[59]: 
array([[ 3366307.75,  5814192.5 ],
       [ 3366372.25,  5814351.  ],
       [ 3366593.5 ,  5814844.5 ],
       [ 3367585.5 ,  5814429.5 ],
       ...
       [ 3366307.75,  5814192.5 ]], dtype=float32)

The reason I say it seems is because apparently the arrays are the same. I can't see the data directly, but checking with np.allclose returns True:

In [65]: np.allclose(boundary2, boundary4)
Out[65]: True

So, if you read so far, I hope you see why I am confused, and maybe there someone who can answer the following 2 questions:

  1. Why is dtype=float32 "hiding" my data ?
  2. Should I be concerned about it or I can safely continue using dtype=float?

Upvotes: 0

Views: 3273

Answers (1)

Sven Marnach
Sven Marnach

Reputation: 601421

All floating point types have limited precision. The number of significant digits they can store depends on the number of bits in the floating point type. If you provide float, numpy.float or numpy.float64 as dtype, 64 bits are used ("double precision"), resulting in about 16 significant decimal digits. For numpy.float32, 32 bits are used ("single precision"), resulting in about 8 significant decimal digits. So nothing is "hidden", you simply see the effects of limited floating point precision. numpy.allclose() returns True because all values are close within the limits of the floating point type you chose.

Upvotes: 4

Related Questions