maryam roayaee
maryam roayaee

Reputation: 43

Numpy Nested Structured Arrays by reference

I have the folowwing data structure:

N=100
TB =     {'names':('n', 'D'),'formats':(int, int)}
TA =     {'names':('id', 'B'),'formats':(int, dtype((TB, (N))))}
a = np.empty(1000, dtype=TA)
b = np.empty(N, dtype=TB)

where a is a structured array with two fields: 'id' and 'B'. In 'B' another structured array with fields 'n' and D is stored, e.g.

for i in range(0,1000):
   a['B'][i] = b

When the above assignment is executed, the data from b is copied to a. Is there a way to copy just the reference to b, so that when I change b, the change is reflected in a['B'][i]? What I want is to store pointers to b in a, because I dont need to create copies as the data in b is identical for every row of a.

I tired

TA = {'names':('id', 'B'),'formats':(int, object)}

and it works, but breaks the nested structure of the arrays. Is there a way the retain structured array functionality, e.g. a['B']['D']

Thanks

Upvotes: 4

Views: 1946

Answers (2)

Davoud Taghawi-Nejad
Davoud Taghawi-Nejad

Reputation: 16776

Yes, you can just open a view. But it works the other way around as you described:

>>> a = np.array([1,2,3,4,5,6])
>>> b = a[2:4].view()
>>> b[0] = 0
>>> b[1] = 0
>>> a
array([1, 2, 0, 0, 5, 6])

Upvotes: 0

senderle
senderle

Reputation: 150957

The short answer is no. Although the syntax for numpy arrays looks the same as standard python syntax, what's happening behind the scenes is very different. Complex numpy datatypes like TA use large blocks of contiguous memory to store each record; the memory has to be laid out regularly, or everything falls apart.

So when you create a 1000-item array with a nested datatype like TA, you're actually allocating 1000 blocks of memory, each of which is large enough to contain N distinct TB arrays. That's exactly why you can do things like a['B']['D'] -- or, to point a point on it, things like this:

>>> (a['B'][1]['D'] == a['B']['D'][1]).all()
True
>>> a['B'][1]['D'][0] = 123456789
>>> (a['B'][1]['D'] == a['B']['D'][1]).all()
True

For normal Python objects, the above would fail, because object item access order matters. It's actually very weird that this is possible in numpy, and the only reason it's possible is that numpy uses uniformly structured contiguous memory.

As far as I know, numpy doesn't provide any way to do what you're asking (someone correct me if I'm wrong!), and the indirection required would probably involve significant changes to numpy's API.

I'll add that I don't think it makes a lot of sense to do this anyway. If only one copy of the array is needed, why not just store it outside the array? You could even pass it around along with the numpy array, as part of a tuple or namedtuple.

Upvotes: 4

Related Questions