Reputation: 43
I have the folowwing data structure:
N=100
TB = {'names':('n', 'D'),'formats':(int, int)}
TA = {'names':('id', 'B'),'formats':(int, dtype((TB, (N))))}
a = np.empty(1000, dtype=TA)
b = np.empty(N, dtype=TB)
where a is a structured array with two fields: 'id' and 'B'. In 'B' another structured array with fields 'n' and D is stored, e.g.
for i in range(0,1000):
a['B'][i] = b
When the above assignment is executed, the data from b is copied to a. Is there a way to copy just the reference to b, so that when I change b, the change is reflected in a['B'][i]
? What I want is to store pointers to b in a, because I dont need to create copies as the data in b is identical for every row of a.
I tired
TA = {'names':('id', 'B'),'formats':(int, object)}
and it works, but breaks the nested structure of the arrays. Is there a way the retain structured array functionality, e.g. a['B']['D']
Thanks
Upvotes: 4
Views: 1946
Reputation: 16776
Yes, you can just open a view. But it works the other way around as you described:
>>> a = np.array([1,2,3,4,5,6])
>>> b = a[2:4].view()
>>> b[0] = 0
>>> b[1] = 0
>>> a
array([1, 2, 0, 0, 5, 6])
Upvotes: 0
Reputation: 150957
The short answer is no. Although the syntax for numpy
arrays looks the same as standard python syntax, what's happening behind the scenes is very different. Complex numpy
datatypes like TA
use large blocks of contiguous memory to store each record; the memory has to be laid out regularly, or everything falls apart.
So when you create a 1000-item array with a nested datatype like TA
, you're actually allocating 1000 blocks of memory, each of which is large enough to contain N
distinct TB
arrays. That's exactly why you can do things like a['B']['D']
-- or, to point a point on it, things like this:
>>> (a['B'][1]['D'] == a['B']['D'][1]).all()
True
>>> a['B'][1]['D'][0] = 123456789
>>> (a['B'][1]['D'] == a['B']['D'][1]).all()
True
For normal Python objects, the above would fail, because object item access order matters. It's actually very weird that this is possible in numpy
, and the only reason it's possible is that numpy
uses uniformly structured contiguous memory.
As far as I know, numpy
doesn't provide any way to do what you're asking (someone correct me if I'm wrong!), and the indirection required would probably involve significant changes to numpy
's API.
I'll add that I don't think it makes a lot of sense to do this anyway. If only one copy of the array is needed, why not just store it outside the array? You could even pass it around along with the numpy array, as part of a tuple
or namedtuple
.
Upvotes: 4