Reputation: 167
Can anyone please help me to understand that from where does Numpy's array function infers data type.
I understand it basically infers from the kind of value that has been assigned to the array.
For Example:
> data = [1,2,3,4]
> arr = np.array(data)
So in the above lines the "arr" will have the dtype('int64')
or dtype('int32')
.
What I am trying to understand is how does it decides whether to give it a int64
or a int32
?
I understand that it might be a trivial question but I am just trying to understand that how does it work as I was recently asked this in an interview.
Upvotes: 4
Views: 2255
Reputation: 9363
Numeric data types include integers and floats.
If we have an array that contains both integers
and floating point numbers
, numpy
will assign the entire array to the float
data type so the decimal points are not lost.
An integer will never have a decimal point. So for example, 2.55 would be stored as 2
As mentioned by @unutbu int32
and int64
depends on the type of bit-machines you have, whether it is a 32 bit-machine or a 64 bit-machine
Strings
, are values that contain numbers
and/or characters
. For example, a string might be a word, a sentence, or several sentences. The most general dtype=string
will be assigned to your array if your array has mixed types (numbers and strings).
To have a complete detailed look, you can have a look at this website of scipy docs
Upvotes: 3
Reputation: 231385
In Python3 (and a basic 32 bit machine), int32 v int64 depends on the size of the input
In [447]: np.array(123456789)
Out[447]: array(123456789)
In [448]: _.dtype
Out[448]: dtype('int32')
In [449]: np.array(12345678901234)
Out[449]: array(12345678901234, dtype=int64)
From the np.array
docs:
dtype: The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to 'upcast' the array.
Looks like int32
is the smallest default int size (at least with my configuration). The is also the value of np.int_
.
As an example of the disallowed downcast:
In [456]: np.array(12345678901234, dtype=np.int32)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
<ipython-input-456-da7c96e4b0b3> in <module>()
----> 1 np.array(12345678901234, dtype=np.int32)
OverflowError: Python int too large to convert to C long
Upvotes: 2
Reputation: 1546
I think there is some kind of a hierarchical treatment, where it uses the most conservative yet also all-encompassing type that can "legally" represent the input. If you just have integers, you will preserve all of the elements using int32/64. As soon as you introduce a float, you need to use float32/64 to preserve all of the elements of the array, and you can always back-convert a float
to an int
. As soon as you introduce a string, you need to use strings to legally represent everything in the array, and again, you can always back-convert to float
or int
if you need to
Ex:
>>> array([1]).dtype
dtype('int64')
>>> array([1, 2.0]).dtype
dtype('float64')
>>> array([1, 2.0, 'a']).dtype
dtype('S3')
In short, it is pretty smart about it ;)
Upvotes: 0
Reputation: 879511
Per the docs,
Some types, such as int and intp, have differing bitsizes, dependent on the platforms (e.g. 32-bit vs. 64-bit machines).
So, on 32-bit machines, np.array([1,2,3,4])
returns an array of dtype int32
, but on 64-bit machines it returns an array of dtype int64
.
Upvotes: 2