danb
danb

Reputation: 71

arrays of arrays in cython

How can one declare an array of arrays in cython?

More precisely, I want to construct (declare and then initialize) an m by n matrix, call it A, in which each entry [i,j] is a 1-dimensional array of doubles (of length min(i,j), filled with zeros) of the form

cdef np.ndarray[np.double_t, ndim=1] A[i,j]
A[i,j] = np.zeros((min(i,j)), dtype=np.double)

For (m,n)=(4,3), print A should return something like this:

[[[], [], []],
[[], [0.], [0.]],
[[], [0.], [0.,0.]],
[[], [0.], [0.,0.]]]

How do I declare and initialize A?

Upvotes: 6

Views: 2691

Answers (1)

Veedrac
Veedrac

Reputation: 60227

The object method:

import numpy

def thing(int m, int n):
    cdef int i, j
    cdef object[:, :] A = numpy.empty((m, n), dtype=object)

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            A[i, j] = numpy.zeros(min(i, j))

    return A

Note that that object[:, :] syntax is the newer version, the numpy.ndarray[object, ndim=2] version is deprecated. The newer version is GIL-free (well, probably not when using object types), normally faster (never slower), type-agnostic (works on anything supporting memoryview) and cleaner.

If you want to iterate over the sub-arrays, you'd be doing:

for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        subarray = A[i, j]
        for k in range(subarray.size):
            ...

and you can either type subarray to object (best for small subarrays) or float[:] (best for large subarrays).


The C-level solution is proving to be very tricky. I have a feeling you'll basically end up writing it with pure-C types.

So I'm abandoning that, and here's what I'd do:

import numpy

def thing(int m, int n):
    cdef int i, j

    cdef float[:, :, :] A = numpy.zeros((m, n, min(m, n)), dtype=float)
    cdef int[:, :] A_lengths = numpy.empty((m, n), dtype=int)

    for i in range(A_lengths.shape[0]):
        for j in range(A_lengths.shape[1]):
            A_lengths[i, j] = min(i, j)

    return A, A_lengths

Basically, make a 3D array and a 2D array of corresponding lengths. If there is only a linear variation in lengths (so the max length is a reasonable factor [I'd say up to about 10] of the mean length) then this should have acceptable overhead. It'll allow pure-C calculations while having a tasty memoryview interface.


That's all I've got. Take it or leave it.

Upvotes: 7

Related Questions