Edy Bourne
Edy Bourne

Reputation: 6187

Understanding references: why this Numpy assignment is not working?

I have a little test code like so:

import numpy as np

foo = np.zeros(1, dtype=int)
bar = np.zeros((10, 1), dtype=int)


foo_copy = np.copy(foo)
bar[-1] = foo_copy

foo_copy[-1] = 10

print(foo_copy)
print(bar)

I was expecting both foo_copy and the last element of bar to contain the value 10, but instead the last element of bar is still an np array with value 0 in it.

[10]
[[0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]
 [0]]  # <<--- why not 10?

Isn't that last element pointing to foo_copy?

Or in all assignments np will copy the data over and I can't change it by using the original ndarray?

If so, is there a way to keep that last element as a pointer to foo_bar?

Upvotes: 0

Views: 53

Answers (1)

hpaulj
hpaulj

Reputation: 231395

A numpy array have numeric values, not references (at least for numeric dtypes):

Make a 1d array, and reshape it to 2d:

In [64]: bar = np.arange(12).reshape(4,3)
In [65]: bar
Out[65]: 
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

Another 1d array:

In [66]: foo = np.array([10])
In [67]: foo
Out[67]: array([10])

This assignment is by value:

In [68]: bar[1,1] = foo
In [69]: bar
Out[69]: 
array([[ 0,  1,  2],
       [ 3, 10,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

So is this, though the values are broadcasted to the whole row:

In [70]: bar[2] = foo
In [71]: bar
Out[71]: 
array([[ 0,  1,  2],
       [ 3, 10,  5],
       [10, 10, 10],
       [ 9, 10, 11]])

We can view the 2d array as 1d. This is closer representation of how the values are actually stored (but in a c byte array, 12*8 bytes long):

In [72]: bar1 = bar.ravel()
In [73]: bar1
Out[73]: array([ 0,  1,  2,  3, 10,  5, 10, 10, 10,  9, 10, 11])

Changing an element of view changes the corresponding element of the 2d:

In [74]: bar1[3] = 30
In [75]: bar
Out[75]: 
array([[ 0,  1,  2],
       [30, 10,  5],
       [10, 10, 10],
       [ 9, 10, 11]])

While we can make object dtype arrays, which store references as lists do, they do not have any performance benefits.

The bytestring containing the 'raw data' of bar:

In [76]: bar.tobytes()
Out[76]: b'\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x1e\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00\x00\x00\x00\x00'

The fabled numpy speed comes from working with this raw data with compiled c code. Accessing individual elements with the Python code is relatively slow. It's the whole-array operations like bar*3 that are fast.

Upvotes: 2

Related Questions