user1233157
user1233157

Reputation: 345

what does .dtype do?

I am new to Python, and don't understand what .dtype does.
For example:

>>> aa
array([1, 2, 3, 4, 5, 6, 7, 8])
>>> aa.dtype = "float64"
>>> aa
array([  4.24399158e-314,   8.48798317e-314,   1.27319747e-313,
     1.69759663e-313])

I thought dtype is a property of aa, which should be int, and if I assign aa.dtype = "float64"
thenaa should become array([1.0 ,2.0 ,3.0, 4.0, 5.0, 6.0, 7.0, 8.0]).

Why does it changes its value and size?
What does it mean?

I was actually learning from a piece of code, and shall I paste it here:

def to_1d(array):
 """prepares an array into a 1d real vector"""
    a = array.copy() # copy the array, to avoid changing global
    orig_dtype = a.dtype
    a.dtype = "float64" # this doubles the size of array
    orig_shape = a.shape
    return a.ravel(), (orig_dtype, orig_shape) #flatten and return

I think it shouldn't change the value of the input array but only change its size. Confused of how the function works

Upvotes: 19

Views: 43460

Answers (4)

the wolf
the wolf

Reputation: 35532

By changing the dtype in this way, you are changing the way a fixed block of memory is being interpreted.

Example:

>>> import numpy as np
>>> a=np.array([1,0,0,0,0,0,0,0],dtype='int8')
>>> a
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
>>> a.dtype='int64'
>>> a
array([1])

Note how the change from int8 to int64 changed an 8 element, 8 bit integer array, into a 1 element, 64 bit array. It is the same 8 byte block however. On my i7 machine with native endianess, the byte pattern is the same as 1 in an int64 format.

Change the position of the 1:

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int8')
>>> a.dtype='int64'
>>> a
array([16777216])

Another example:

>>> a=np.array([0,0,0,0,0,0,1,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([0, 0, 0, 1])

Change the position of the 1 in the 32 byte, 32 bit array:

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([         0, 4294967296,          0,          0]) 

It is the same block of bits reinterpreted.

Upvotes: 6

Joe Kington
Joe Kington

Reputation: 284602

First off, the code you're learning from is flawed. It almost certainly doesn't do what the original author thought it did based on the comments in the code.

What the author probably meant was this:

def to_1d(array):
    """prepares an array into a 1d real vector"""
    return array.astype(np.float64).ravel()

However, if array is always going to be an array of complex numbers, then the original code makes some sense.

The only cases where viewing the array (a.dtype = 'float64' is equivalent to doing a = a.view('float64')) would double its size is if it's a complex array (numpy.complex128) or a 128-bit floating point array. For any other dtype, it doesn't make much sense.

For the specific case of a complex array, the original code would convert something like np.array([0.5+1j, 9.0+1.33j]) into np.array([0.5, 1.0, 9.0, 1.33]).

A cleaner way to write that would be:

def complex_to_iterleaved_real(array):
     """prepares a complex array into an "interleaved" 1d real vector"""
    return array.copy().view('float64').ravel()

(I'm ignoring the part about returning the original dtype and shape, for the moment.)


Background on numpy arrays

To explain what's going on here, you need to understand a bit about what numpy arrays are.

A numpy array consists of a "raw" memory buffer that is interpreted as an array through "views". You can think of all numpy arrays as views.

Views, in the numpy sense, are just a different way of slicing and dicing the same memory buffer without making a copy.

A view has a shape, a data type (dtype), an offset, and strides. Where possible, indexing/reshaping operations on a numpy array will just return a view of the original memory buffer.

This means that things like y = x.T or y = x[::2] don't use any extra memory, and don't make copies of x.

So, if we have an array similar to this:

import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])

We could reshape it by doing either:

x = x.reshape((2, 5))

or

x.shape = (2, 5)

For readability, the first option is better. They're (almost) exactly equivalent, though. Neither one will make a copy that will use up more memory (the first will result in a new python object, but that's beside the point, at the moment.).


Dtypes and views

The same thing applies to the dtype. We can view an array as a different dtype by either setting x.dtype or by calling x.view(...).

So we can do things like this:

import numpy as np
x = np.array([1,2,3], dtype=np.int)

print 'The original array'
print x

print '\n...Viewed as unsigned 8-bit integers (notice the length change!)'
y = x.view(np.uint8)
print y

print '\n...Doing the same thing by setting the dtype'
x.dtype = np.uint8
print x

print '\n...And we can set the dtype again and go back to the original.'
x.dtype = np.int
print x

Which yields:

The original array
[1 2 3]

...Viewed as unsigned 8-bit integers (notice the length change!)
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...Doing the same thing by setting the dtype
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...And we can set the dtype again and go back to the original.
[1 2 3]

Keep in mind, though, that this is giving you low-level control over the way that the memory buffer is interpreted.

For example:

import numpy as np
x = np.arange(10, dtype=np.int)

print 'An integer array:', x
print 'But if we view it as a float:', x.view(np.float)
print "...It's probably not what we expected..."

This yields:

An integer array: [0 1 2 3 4 5 6 7 8 9]
But if we view it as a float: [  0.00000000e+000   4.94065646e-324   
   9.88131292e-324   1.48219694e-323   1.97626258e-323   
   2.47032823e-323   2.96439388e-323   3.45845952e-323
   3.95252517e-323   4.44659081e-323]
...It's probably not what we expected...

So, we're interpreting the underlying bits of the original memory buffer as floats, in this case.

If we wanted to make a new copy with the ints recasted as floats, we'd use x.astype(np.float).


Complex Numbers

Complex numbers are stored (in both C, python, and numpy) as two floats. The first is the real part and the second is the imaginary part.

So, if we do:

import numpy as np
x = np.array([0.5+1j, 1.0+2j, 3.0+0j])

We can see the real (x.real) and imaginary (x.imag) parts. If we convert this to a float, we'll get a warning about discarding the imaginary part, and we'll get an array with just the real part.

print x.real
print x.astype(float)

astype makes a copy and converts the values to the new type.

However, if we view this array as a float, we'll get a sequence of item1.real, item1.imag, item2.real, item2.imag, ....

print x
print x.view(float)

yields:

[ 0.5+1.j  1.0+2.j  3.0+0.j]
[ 0.5  1.   1.   2.   3.   0. ]

Each complex number is essentially two floats, so if we change how numpy interprets the underlying memory buffer, we get an array of twice the length.

Hopefully that helps clear things up a bit...

Upvotes: 45

David Heffernan
David Heffernan

Reputation: 612964

The documentation for the dtype attribute of ndarray is not very useful at all. Looking at your output it would seem that the buffer of eight 4 byte integers is being reinterpreted as four 8 byte floats.

But what you want is to specify the dtype in the array creation:

array([1, 2, 3, 4, 5, 6, 7, 8], dtype="float64")

Upvotes: 2

Ameer
Ameer

Reputation: 2638

After messing around with it, I think manually assigning dtype does a reinterpret cast rather than what you want. Meaning I think it interprets the data directly as a float rather than converting it to one. Maybe you could try aa = numpy.array(aa.map(float, aa)).

Further Explanation: dtype is the type of the data. To quote verbatim from the documentation

A data type object (an instance of numpy.dtype class) describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.

ints and floats don't have the same bit patterns, meaning you can't just look at the memory for an int and it would be the same number when you look at it as a float. By setting dtype to float64 you are just telling the computer to read that memory as float64 instead of actually converting the integer numbers to floating point numbers.

Upvotes: 3

Related Questions