ARF
ARF

Reputation: 7694

Fortran ordered (column-major) numpy structured array possible?

I am looking for a way to more efficiently assign column of a numpy structured array.

Example:

my_col = fn_returning_1D_array(...)

executes more than two times faster on my machine than the same assignment to the column of a structured array:

test = np.ndarray(shape=(int(8e6),), dtype=dtype([('column1', 'S10'), ...more columns...]))
test['column1'] = fn_returning_1D_array(...)

I tried creating test with fortran ordering but it did not help. Presumably the fields stay interleaved in memory.

Does anybody have any idea here? I would be willing to use low-level numpy interfaces and cython if they could help.


Edit 1: in response to hpaulj's answer

The apparent equivalence of recarray column assignment and "normal" array column assignment results only if the latter is created with row-major order. With column-major ordering the two assignments are far from equivalent:

Row-major

In [1]: import numpy as np

In [2]: M,N=int(1e7),10

In [4]: A1=np.zeros((M,N),'f')

In [9]: dt=np.dtype(','.join(['f' for _ in range(N)]))

In [10]: A2=np.zeros((M,),dtype=dt)

In [11]: X=np.arange(M+0.0)

In [13]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 2.36 s per loop

In [15]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.36 s per loop

In [16]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 334 ms per loop

In [8]: A1.flags
Out[8]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Column-major

In [1]: import numpy as np

In [2]: M,N=int(1e7),10

In [3]: A1=np.zeros((M,N),'f', 'F')

In [4]: dt=np.dtype(','.join(['f' for _ in range(N)]))

In [5]: A2=np.zeros((M,),dtype=dt)

In [6]: X=np.arange(M+0.0)

In [8]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 374 ms per loop

In [9]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.43 s per loop

In [10]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 380 ms per loop

In [11]: A1.flags
Out[11]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Note that for column-major ordering the two buffers are no longer identical:

In [6]: A3=np.zeros_like(A2)

In [7]: A3.data = A1.data

In [20]: A2[0]
Out[20]: (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)

In [21]: A2[1]
Out[21]: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)

In [16]: A3[0]
Out[16]: (0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)

In [17]: A3[1]
Out[17]: (10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0)

Upvotes: 2

Views: 464

Answers (1)

hpaulj
hpaulj

Reputation: 231395

These are not equivalent actions. One just generates an array (and assigns it to a variable, a minor action). The other generates the array and fills a column of a structured array.

my_col = fn_returning_1D_array(...)
test['column1'] = fn_returning_1D_array(...)

I think a fairer comparison, would to fill in the columns of a 2D array.

In [38]: M,N=1000,10
In [39]: A1=np.zeros((M,N),'f')   # 2D array
In [40]: dt=np.dtype(','.join(['f' for _ in range(N)]))
In [41]: A2=np.zeros((M,),dtype=dt)   # structured array
In [42]: X=np.arange(M+0.0)

In [43]: A1[:,0]=X   # fill a column
In [44]: A2['f0']=X   # fill a field

In [45]: timeit for n in range(N):A1[:,n]=X
10000 loops, best of 3: 65.3 µs per loop

In [46]: timeit for n in dt.names: A2[n]=X
10000 loops, best of 3: 40.6 µs per loop

I'm a bit surprised that filling the structured array is faster.

Of course the fast way to fill the 2D array is with broadcasting:

In [50]: timeit A1[:,:]=X[:,None]
10000 loops, best of 3: 29.2 µs per loop

But the improvement over filling the fields is not that great.

I don't see anything significantly wrong with filling a structured array field by field. It's got to be faster than generating a list of tuples to fill the whole array.

I believe A1 and A2 have identical data buffers. For example if I makes a zeros copy of A2, I can replace its data buffer with A1's, and get a valid structured array

In [64]: A3=np.zeros_like(A2)
In [65]: A3.data=A1.data

So the faster way of filling the structured array is to do the fastest 2D fill, followed by this data assignment.

But in the general case the challenge is to create a compatible 2D array. It's easy when all field dtypes are the same. With a mix of dtypes you'd have to work at the byte level. There are some advanced dtype specifications (with offsets, etc), that may facilitate such a mapping.


Now you have shifted the focus to Fortran order. In the case of a 2d array that does help. But it will do so at the expense of row oriented operations.

In [89]: A1=np.zeros((M,N),'f',order='F')

In [90]: timeit A1[:,:]=X[:,None]
100000 loops, best of 3: 18.2 µs per loop

One thing that you haven't mentioned, at least not before the last rewrite of the question, is how you intend to use this array. If it is just a place to store a number of arrays by name, you could use a straight forward Python dictionary:

In [96]: timeit D={name:X.copy() for name in dt.names}
10000 loops, best of 3: 25.2 µs per loop

Though this really is a time test for X.copy().

In any case, there isn't any equivalent to Fortran order when dealing with dtypes. None of the array operations like reshape, swapaxes, strides, broadcasting cross the 'dtype' boundary.

Upvotes: 1

Related Questions