Reputation: 7591
Say we have a class in cython that wraps (via a pointer) a C++ class with unknown/variable size in memory:
//poly.h
class Poly{
std::vector[int] v
// [...] Methods to initialize/add/multiply/... coefficients [...] e.g.,
Poly(int len, int val){for (int i=0; i<len; i++){this->v.push_back(val)};};
void add(Poly& p) {for (int i=0; i<this->v.size();i++){this->v[i] += p->v[i];};};
};
We can conveniently expose operations like add
in PyPoly
using operator overloads (e.g., __add__
/__iadd__
):
cdef extern from "poly.h":
cdef cppclass Poly:
Poly(int len, int val)
void add(Poly& p)
#pywrapper.pyx
cdef class PyPoly
cdef Poly* c_poly
cdef __cinit__(self, int l, int val):
self.c_poly = new Poly(l, val)
cdef __dealloc__(self):
del self.c_poly
def __add__(self, PyPoly other):
new_poly = PyPoly(self.c_poly.size(), 0)
new_poly.c_poly.add(self.c_poly)
new_poly.c_poly.add(other.c_poly)
return new_poly
How to create an efficient 1D numpy array with this cdef class?
The naive way I'm using so far involves a np.ndarray of type object
, which benefits from the existing operator overloads:
pypoly_arr = np.array([PyPoly(l=10, val) for val in range(10)])
pypoly_sum = np.sum(pypoly_arr) # Works thanks to implemented PyPoly.__add__
However, the above solution has to go through python code to understand the data type and the proper way to deal with __add__
, which becomes quite cumbersome for big array sizes.
Inspired by https://stackoverflow.com/a/45150611/9670056, I tried with an array wrapper of my own, but I'm not sure how to create a vector[PyPoly]
, whether I should do it or instead just hold a vector of borrowed references vector[Poly*]
, so that the call to np.sum
could be treated (and paralellized) at C++ level.
Any help/suggestions will be highly appreciated! (specially to rework the question/examples to make it as generic as possible & runnable)
Upvotes: 2
Views: 177
Reputation: 50298
This is not possible to do that in Cython. Indeed, Numpy does not support native Cython classes as a data type. The reason is that the Numpy code is written in C and it already compiled when your Cython code is compiled. This means Numpy cannot directly use your native type. It has to do an indirection and this indirection is made possible through the object
CPython type which has the downside of being slow (mainly because of the actual indirection but also a bit because of CPython compiler overheads). Cython do not reimplement Numpy primitives as it would be a huge work. Numpy only supports a restricted predefined set of data types. It supports custom user types such types are not as powerful as CPython classes (eg. you cannot reimplement custom operators on items like you did).
Just-in-time (JIT) compiler modules like Numba can theoretically supports this because they reimplement Numpy and generate a code at runtime. However, the support of JIT classes in Numba is experimental and AFAIK array of JIT classes are not yet supported.
Note that you do not need to build an array in this case. A basic loop is faster and use less memory. Something (untested) like:
cdef int val
cdef PyPoly pypoly_sum
pypoly_sum = PyPoly(l=10, 0)
for val in range(1, 10):
pypoly_sum += PyPoly(l=10, val)
Upvotes: 1