Reputation: 7694
I am looking for a way to more efficiently assign column of a numpy structured array.
Example:
my_col = fn_returning_1D_array(...)
executes more than two times faster on my machine than the same assignment to the column of a structured array:
test = np.ndarray(shape=(int(8e6),), dtype=dtype([('column1', 'S10'), ...more columns...]))
test['column1'] = fn_returning_1D_array(...)
I tried creating test
with fortran ordering but it did not help. Presumably the fields stay interleaved in memory.
Does anybody have any idea here? I would be willing to use low-level numpy interfaces and cython if they could help.
The apparent equivalence of recarray column assignment and "normal" array column assignment results only if the latter is created with row-major order. With column-major ordering the two assignments are far from equivalent:
Row-major
In [1]: import numpy as np
In [2]: M,N=int(1e7),10
In [4]: A1=np.zeros((M,N),'f')
In [9]: dt=np.dtype(','.join(['f' for _ in range(N)]))
In [10]: A2=np.zeros((M,),dtype=dt)
In [11]: X=np.arange(M+0.0)
In [13]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 2.36 s per loop
In [15]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.36 s per loop
In [16]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 334 ms per loop
In [8]: A1.flags
Out[8]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
Column-major
In [1]: import numpy as np
In [2]: M,N=int(1e7),10
In [3]: A1=np.zeros((M,N),'f', 'F')
In [4]: dt=np.dtype(','.join(['f' for _ in range(N)]))
In [5]: A2=np.zeros((M,),dtype=dt)
In [6]: X=np.arange(M+0.0)
In [8]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 374 ms per loop
In [9]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.43 s per loop
In [10]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 380 ms per loop
In [11]: A1.flags
Out[11]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
Note that for column-major ordering the two buffers are no longer identical:
In [6]: A3=np.zeros_like(A2)
In [7]: A3.data = A1.data
In [20]: A2[0]
Out[20]: (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)
In [21]: A2[1]
Out[21]: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
In [16]: A3[0]
Out[16]: (0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
In [17]: A3[1]
Out[17]: (10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0)
Upvotes: 2
Views: 464
Reputation: 231395
These are not equivalent actions. One just generates an array (and assigns it to a variable, a minor action). The other generates the array and fills a column of a structured array.
my_col = fn_returning_1D_array(...)
test['column1'] = fn_returning_1D_array(...)
I think a fairer comparison, would to fill in the columns of a 2D array.
In [38]: M,N=1000,10
In [39]: A1=np.zeros((M,N),'f') # 2D array
In [40]: dt=np.dtype(','.join(['f' for _ in range(N)]))
In [41]: A2=np.zeros((M,),dtype=dt) # structured array
In [42]: X=np.arange(M+0.0)
In [43]: A1[:,0]=X # fill a column
In [44]: A2['f0']=X # fill a field
In [45]: timeit for n in range(N):A1[:,n]=X
10000 loops, best of 3: 65.3 µs per loop
In [46]: timeit for n in dt.names: A2[n]=X
10000 loops, best of 3: 40.6 µs per loop
I'm a bit surprised that filling the structured array is faster.
Of course the fast way to fill the 2D array is with broadcasting:
In [50]: timeit A1[:,:]=X[:,None]
10000 loops, best of 3: 29.2 µs per loop
But the improvement over filling the fields is not that great.
I don't see anything significantly wrong with filling a structured array field by field. It's got to be faster than generating a list of tuples to fill the whole array.
I believe A1
and A2
have identical data buffers. For example if I makes a zeros copy of A2, I can replace its data buffer with A1's
, and get a valid structured array
In [64]: A3=np.zeros_like(A2)
In [65]: A3.data=A1.data
So the faster way of filling the structured array is to do the fastest 2D fill, followed by this data
assignment.
But in the general case the challenge is to create a compatible 2D array. It's easy when all field dtypes are the same. With a mix of dtypes you'd have to work at the byte level. There are some advanced dtype
specifications (with offsets, etc), that may facilitate such a mapping.
Now you have shifted the focus to Fortran order. In the case of a 2d array that does help. But it will do so at the expense of row oriented operations.
In [89]: A1=np.zeros((M,N),'f',order='F')
In [90]: timeit A1[:,:]=X[:,None]
100000 loops, best of 3: 18.2 µs per loop
One thing that you haven't mentioned, at least not before the last rewrite of the question, is how you intend to use this array. If it is just a place to store a number of arrays by name, you could use a straight forward Python dictionary:
In [96]: timeit D={name:X.copy() for name in dt.names}
10000 loops, best of 3: 25.2 µs per loop
Though this really is a time test for X.copy()
.
In any case, there isn't any equivalent to Fortran order when dealing with dtypes. None of the array operations like reshape
, swapaxes
, strides
, broadcasting cross the 'dtype' boundary.
Upvotes: 1