ibarrond
ibarrond

Reputation: 7591

Cython: efficient custom numpy 1D array for cdef class

Say we have a class in cython that wraps (via a pointer) a C++ class with unknown/variable size in memory:

//poly.h
class Poly{
   std::vector[int] v
   // [...] Methods to initialize/add/multiply/... coefficients [...] e.g.,
   Poly(int len, int val){for (int i=0; i<len;       i++){this->v.push_back(val)};};
   void add(Poly& p) {for (int i=0; i<this->v.size();i++){this->v[i] += p->v[i];};};
};

We can conveniently expose operations like add in PyPoly using operator overloads (e.g., __add__/__iadd__):

cdef extern from "poly.h":
    cdef cppclass Poly:
        Poly(int len, int val)
        void add(Poly& p)
#pywrapper.pyx
cdef class PyPoly
   cdef Poly* c_poly
   cdef __cinit__(self, int l, int val):
        self.c_poly = new Poly(l, val)
   cdef __dealloc__(self):
        del self.c_poly
   def __add__(self, PyPoly other):
        new_poly = PyPoly(self.c_poly.size(), 0)
        new_poly.c_poly.add(self.c_poly)
        new_poly.c_poly.add(other.c_poly)
        return new_poly

How to create an efficient 1D numpy array with this cdef class?

The naive way I'm using so far involves a np.ndarray of type object, which benefits from the existing operator overloads:

pypoly_arr = np.array([PyPoly(l=10, val) for val in range(10)])
pypoly_sum = np.sum(pypoly_arr)    # Works thanks to implemented PyPoly.__add__

However, the above solution has to go through python code to understand the data type and the proper way to deal with __add__, which becomes quite cumbersome for big array sizes.

Inspired by https://stackoverflow.com/a/45150611/9670056, I tried with an array wrapper of my own, but I'm not sure how to create a vector[PyPoly], whether I should do it or instead just hold a vector of borrowed references vector[Poly*], so that the call to np.sum could be treated (and paralellized) at C++ level.

Any help/suggestions will be highly appreciated! (specially to rework the question/examples to make it as generic as possible & runnable)

Upvotes: 2

Views: 177

Answers (1)

J&#233;r&#244;me Richard
J&#233;r&#244;me Richard

Reputation: 50298

This is not possible to do that in Cython. Indeed, Numpy does not support native Cython classes as a data type. The reason is that the Numpy code is written in C and it already compiled when your Cython code is compiled. This means Numpy cannot directly use your native type. It has to do an indirection and this indirection is made possible through the object CPython type which has the downside of being slow (mainly because of the actual indirection but also a bit because of CPython compiler overheads). Cython do not reimplement Numpy primitives as it would be a huge work. Numpy only supports a restricted predefined set of data types. It supports custom user types such types are not as powerful as CPython classes (eg. you cannot reimplement custom operators on items like you did).

Just-in-time (JIT) compiler modules like Numba can theoretically supports this because they reimplement Numpy and generate a code at runtime. However, the support of JIT classes in Numba is experimental and AFAIK array of JIT classes are not yet supported.

Note that you do not need to build an array in this case. A basic loop is faster and use less memory. Something (untested) like:

cdef int val
cdef PyPoly pypoly_sum

pypoly_sum = PyPoly(l=10, 0)
for val in range(1, 10):
    pypoly_sum += PyPoly(l=10, val)

Upvotes: 1

Related Questions