Reputation: 81
I just started using numpy and figured out something strange happens. (i read the official Quickstart Tutorial but it didn't help) This is the code:
>>> jok = np.int16(33)
>>> jok.dtype
dtype('int16')
>>> jok += 1
>>> jok
34
>>> jok.dtype
dtype('int64')
When i apply arithmetic operations to a variable (jok) it changes the 'dtype' from 'int16' to 'int64'. But when i apply the same operations to arrays it stays the same, it doesn't change the 'dtype':
>>> ar = np.arange(6,dtype='int8')
>>> ar
array([0, 1, 2, 3, 4, 5], dtype=int8)
>>> ar += 10
>>> ar
array([10, 11, 12, 13, 14, 15], dtype=int8)
Why does this happens?
Is it possible to apply arithmetic operations to a variable like 'jok' and conserving the specifies 'dtype' of the variable (in my case 'int16')?
And why does it always change them to 'int64'. I know 'int64' is the default type of numpy, but i want to save some memory making the type of my variables smaller.
Are there any reasons for me to stay with 'int64' knowing that my maximum value will not even reach 1,000. Most of my variables will be below 200 ('jok' will always be < 400).
Upvotes: 0
Views: 194
Reputation: 231335
__array_priority__
may explain the pattern you see.
First the scalar created by np.int16
:
In [303]: jok = np.int16(33)
In [304]: jok.__array_priority__
Out[304]: -1000000.0
and the priority of an array created from a python int:
In [305]: np.array(1).__array_priority__
Out[305]: 0.0
In this addition the int is first converted to np.array
; it's priority is higher than jok
, so the dtype
is changed:
In [306]: jok += 1
In [307]: jok.dtype
Out[307]: dtype('int64')
In [308]: type(jok)
Out[308]: numpy.int64
Adding a float changes dtype to float - again based on priority:
In [309]: jok += 3.2
In [310]: jok
Out[310]: 37.2
But if we make an array, 0d, with int16
dtype:
In [311]: jok = np.array(33, 'int16')
In [312]: jok.__array_priority__
Out[312]: 0.0
In [313]: jok += 1
In [314]: jok.dtype
Out[314]: dtype('int16')
In [315]: jok += 3.2
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
<ipython-input-315-28d0135066df> in <module>
----> 1 jok += 3.2
UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int16') with casting rule 'same_kind'
Adding the int preserves the dtype; but trying to add a float results in a casting error. jok+3.2
produces a float, but that can't be put into
the int16
array.
As a general rule, I don't recommend creating variables with np.int16(...)
(or other such functions. Use the np.array(.., dtype)
function instead.
The two classes have many of the same methods, but aren't identical. I don't think there's a good reason to make the np.int16
object directly:
In [317]: type(np.int16(33)).__mro__
Out[317]:
(numpy.int16,
numpy.signedinteger,
numpy.integer,
numpy.number,
numpy.generic,
object)
In [318]: type(np.array(33, 'int16'))
Out[318]: numpy.ndarray
In [319]: type(np.array(33, 'int16')).__mro__
Out[319]: (numpy.ndarray, object)
np.int16
objects are created indirectly by indexing an array:
In [320]: type(np.array(33, 'int16')[()])
Out[320]: numpy.int16
But we seldom try to do things like +=
on such a variable.
Upvotes: 3
Reputation: 903
The main question here is probably: do you need to cast your ints as 'int16' / etc.? My thought is that 99% of the time, you probably don't need to worry about it.
But to answer your question, if you want to ensure that your data types remain the same, it seems that you'll need to wrap your plain ints in np.int16()
. For example:
jok = np.int16(33)
jok += np.int16(1)
As far as why it happens, I unfortunately can't answer that—you could dig around in the C code under the hood of NumPy if you really want to find out: https://github.com/numpy/numpy
Upvotes: 1