Changing the default data type in numpy.asarray

Question

I am using numpy.asarray in my project to handle arrays due to its superb efficiency comparing with default Python lists. I am also supposed to take care of memory utilization when allocating the array because my program can receive big data in gigabytes. While checking numpy.asarray, I found out that the data type is inferred from the array itself unless stated. Thus, I have the following array:

np.asarray([list(map(int, list(x))) for x in X])

When I print print X.dtype, I got int64. Since the array X here always contains binary values, 0 or 1, I thought to use dtype=np.int8 to reduce the memory needed when allocating space. But I am not sure if this is a good idea! Should I stick with the default int64? Could int8 lose any data precisions that I cannot think of?

Thank you.

Marco · Accepted Answer

From NumPy Manual:

Array types and conversions between types
Data type    Description

...
int8         Byte (-128 to 127)
...

If you are only going to put binary values in the array than it will be just fine. You won't lose any data precision.

You could even think to set data type to bool_ which is stored as a byte and will definitely be the best solution for your memory and works as an int too.

>>> import numpy as np
>>> x = np.asarray([1,0,1,0], dtype=np.bool_)
>>> x
array([ True, False,  True, False], dtype=bool)
>>> x + 2
array([3, 2, 3, 2])

Changing the default data type in numpy.asarray

Answers (1)

Array types and conversions between types

Related Questions