Numpy: Permanent changes to using numpy.ndarray.view?

Question

If I wanted to change the data type of a numpy array permanently, is reassignment the best way?

Here's an example to illustrate the syntax:

import numpy as np
x = np.array([1],dtype='float')
x = x.view(dtype=('x','float"))

However, I'd prefer to change change the data type "in place".

Is there any way to change the dtype of an array in-place? Or is it better to so something similar to this?:

x = x.view(dtype=('x',"float")).copy()

Joe Kington · Accepted Answer

There is no "permanent" dtype, really.

Numpy arrays are basically just a way of viewing a memory buffer.

Views, in the numpy sense, are just a different way of slicing and dicing the same memory buffer without making a copy.

Keep in mind, though, that this is giving you low-level control over the way that the memory buffer is interpreted.

For example:

import numpy as np
x = np.arange(10, dtype=np.int)

print 'An integer array:', x
print 'But if we view it as a float:', x.view(np.float)
print "...It's probably not what we expected..."

This yields:

An integer array: [0 1 2 3 4 5 6 7 8 9]
But if we view it as a float: [  0.00000000e+000   4.94065646e-324   
   9.88131292e-324   1.48219694e-323   1.97626258e-323   
   2.47032823e-323   2.96439388e-323   3.45845952e-323
   3.95252517e-323   4.44659081e-323]
...It's probably not what we expected...

So, we're interpreting the underlying bits of the original memory buffer as floats, in this case.

If we wanted to make a new copy with the ints recasted as floats, we'd use x.astype(np.float).

The reason views (of dtypes... Slicing returns a view as well, though that's a separate topic.) are so useful, is that you can do some really nice tricks without having to duplicate things in memory.

For example, if you wanted to convert floats to ints in place (without duplicating memory), you can use a couple of tricks with views to do this. (Based on @unutbu's answer on this question.)

import numpy as np
x = np.arange(10, dtype=np.int)
print 'The original int array:', x

# We would have just used y = x.astype(np.float), but it makes a copy.
# This doesn't. If we're worried about memory consumption, it's handy!
y = x.view(np.float)
y[:] = x

print 'The new float array:', y
print 'But, the old int array has been modified in-place'
print x
print "They're both views into the same memory buffer!"

Similarly, you can do various low-level bit-twiddling:

import numpy as np
x = np.arange(10, dtype=np.uint16)
y = x.view(np.uint8)

print 'The original array:', x
print '...Viewed as uint8:', y
print '...Which we can do some tricks with', y[::2]

# Now let's interpret the uint16's as two streams of uint8's...
a, b = y[::2], y[1::2]
b[:] = np.arange(10, dtype=np.uint8)
print a
print b
print x
print y
# Notice that the original is modified... We're not duplicating memory anywhere!

To answer your question, "better" is all relative. Do you want a copy, or do you want to view the same memory buffer in a different way?

As a side note, astype always makes a copy, regardless of the "input" and "output" dtypes. It's often what people actually want when they refer to view. (e.g. if you want to convert between ints and floats, use astype, not view, unless you need to micro-manage memory usage.)

Numpy: Permanent changes to using numpy.ndarray.view?

Answers (1)

Related Questions