bshah
bshah

Reputation: 167

How does Numpy infers dtype for array

Can anyone please help me to understand that from where does Numpy's array function infers data type.

I understand it basically infers from the kind of value that has been assigned to the array.

For Example:

> data = [1,2,3,4]
> arr = np.array(data)

So in the above lines the "arr" will have the dtype('int64') or dtype('int32').

What I am trying to understand is how does it decides whether to give it a int64 or a int32?

I understand that it might be a trivial question but I am just trying to understand that how does it work as I was recently asked this in an interview.

Upvotes: 4

Views: 2255

Answers (4)

Srivatsan
Srivatsan

Reputation: 9363

Numeric data types include integers and floats.

If we have an array that contains both integers and floating point numbers, numpy will assign the entire array to the float data type so the decimal points are not lost.

An integer will never have a decimal point. So for example, 2.55 would be stored as 2

As mentioned by @unutbu int32 and int64 depends on the type of bit-machines you have, whether it is a 32 bit-machine or a 64 bit-machine

Strings, are values that contain numbers and/or characters. For example, a string might be a word, a sentence, or several sentences. The most general dtype=string will be assigned to your array if your array has mixed types (numbers and strings).

To have a complete detailed look, you can have a look at this website of scipy docs

Upvotes: 3

hpaulj
hpaulj

Reputation: 231385

In Python3 (and a basic 32 bit machine), int32 v int64 depends on the size of the input

In [447]: np.array(123456789)
Out[447]: array(123456789)

In [448]: _.dtype
Out[448]: dtype('int32')

In [449]: np.array(12345678901234)
Out[449]: array(12345678901234, dtype=int64)

From the np.array docs:

dtype: The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to 'upcast' the array.

Looks like int32 is the smallest default int size (at least with my configuration). The is also the value of np.int_.

As an example of the disallowed downcast:

In [456]: np.array(12345678901234, dtype=np.int32)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-456-da7c96e4b0b3> in <module>()
----> 1 np.array(12345678901234, dtype=np.int32)

OverflowError: Python int too large to convert to C long

Upvotes: 2

isosceleswheel
isosceleswheel

Reputation: 1546

I think there is some kind of a hierarchical treatment, where it uses the most conservative yet also all-encompassing type that can "legally" represent the input. If you just have integers, you will preserve all of the elements using int32/64. As soon as you introduce a float, you need to use float32/64 to preserve all of the elements of the array, and you can always back-convert a float to an int. As soon as you introduce a string, you need to use strings to legally represent everything in the array, and again, you can always back-convert to float or int if you need to

Ex:

>>> array([1]).dtype
dtype('int64')
>>> array([1, 2.0]).dtype
dtype('float64')
>>> array([1, 2.0, 'a']).dtype
dtype('S3')

In short, it is pretty smart about it ;)

Upvotes: 0

unutbu
unutbu

Reputation: 879511

Per the docs,

Some types, such as int and intp, have differing bitsizes, dependent on the platforms (e.g. 32-bit vs. 64-bit machines).

So, on 32-bit machines, np.array([1,2,3,4]) returns an array of dtype int32, but on 64-bit machines it returns an array of dtype int64.

Upvotes: 2

Related Questions