Fortran ordered (column-major) numpy structured array possible?

Question

I am looking for a way to more efficiently assign column of a numpy structured array.

Example:

my_col = fn_returning_1D_array(...)

executes more than two times faster on my machine than the same assignment to the column of a structured array:

test = np.ndarray(shape=(int(8e6),), dtype=dtype([('column1', 'S10'), ...more columns...]))
test['column1'] = fn_returning_1D_array(...)

I tried creating test with fortran ordering but it did not help. Presumably the fields stay interleaved in memory.

Does anybody have any idea here? I would be willing to use low-level numpy interfaces and cython if they could help.

Edit 1: in response to hpaulj's answer

The apparent equivalence of recarray column assignment and "normal" array column assignment results only if the latter is created with row-major order. With column-major ordering the two assignments are far from equivalent:

Row-major

In [1]: import numpy as np

In [2]: M,N=int(1e7),10

In [4]: A1=np.zeros((M,N),'f')

In [9]: dt=np.dtype(','.join(['f' for _ in range(N)]))

In [10]: A2=np.zeros((M,),dtype=dt)

In [11]: X=np.arange(M+0.0)

In [13]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 2.36 s per loop

In [15]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.36 s per loop

In [16]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 334 ms per loop

In [8]: A1.flags
Out[8]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Column-major

In [1]: import numpy as np

In [2]: M,N=int(1e7),10

In [3]: A1=np.zeros((M,N),'f', 'F')

In [4]: dt=np.dtype(','.join(['f' for _ in range(N)]))

In [5]: A2=np.zeros((M,),dtype=dt)

In [6]: X=np.arange(M+0.0)

In [8]: %timeit for n in range(N):A1[:,n]=X
1 loops, best of 3: 374 ms per loop

In [9]: %timeit for n in dt.names: A2[n]=X
1 loops, best of 3: 2.43 s per loop

In [10]: %timeit A1[:,:]=X[:,None]
1 loops, best of 3: 380 ms per loop

In [11]: A1.flags
Out[11]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

Note that for column-major ordering the two buffers are no longer identical:

In [6]: A3=np.zeros_like(A2)

In [7]: A3.data = A1.data

In [20]: A2[0]
Out[20]: (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)

In [21]: A2[1]
Out[21]: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)

In [16]: A3[0]
Out[16]: (0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)

In [17]: A3[1]
Out[17]: (10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0)

Fortran ordered (column-major) numpy structured array possible?

Edit 1: in response to hpaulj's answer

Answers (1)

Related Questions