How to expose an array of opaque type with pybind11 and NumPy

Question

TL;DR

Using pybind11, how can I expose an array of POD struct with NumPy, while also having them appear to the user as nice Python objects?

The problem

I am exposing a C style API to Python using pybind11. There are some types, implemented as simple POD struct in C, which would make more sense as opaque objects in Python. pybind11 allows me to do that and define how the object looks like in Python.

I also want to expose a dynamically allocated array of those. Doing so using pybind11 and NumPy is possible, but I haven't found a way that's compatible with how I've already exposed the type itself.

I end up with two different Python types, which are not compatible with each other, even though the underlying C type is the same.

Constraints

I am looking for a solution that doesn't involve unnecessary copies. Since all the data are POD, I assume it should be possible to just reinterpret the data as structs on the C side or as opaque objects on the Python side.

The C API is fixed, but I have freedom over how I design the Python API.

Implementation details

On the C/C++ side, the type looks like this:

struct apiprefix_opaque_type
{
    int inner_value;
};

Using pybind11, I'm exposing the structure as an opaque object. It's not critical that inner_value is not exposed, but it simply doesn't have much value for the user, and it makes more sense to have a higher level type.

namespace py = pybind11;

void bindings(py::module_& m)
{
    py::class_(m, "opaque_type")
        .def(py::init([]() {
            apiprefix_opaque_type x;
            x.inner_value = -1;
            return x;
        }))
        .def("is_set", [](const apiprefix_opaque_type& x) -> bool { return x.inner_value != -1; });

    m.def("create_some_opaque", []() -> apiprefix_opaque_type {
        apiprefix_opaque_type x;
        x.inner_value = 42;
        return x;
    });
}

With this in place, on the Python side I have the API behaviour I want.

>>> a = apitest.opaque_type()
>>> a.inner_value # Demonstrating that inner_value is not exposed.
AttributeError: 'apitest.opaque_type' object has no attribute 'inner_value'
>>> a.is_set()
False
>>> b = apitest.create_some_opaque()
>>> b.is_set()
True

Somewhere else in the API, I have a structure containing an array of these, as a pointer and count pair. For the sake of simplicity, let's pretend it's a global variable (even though in reality, it's a member of another dynamically allocated object).

struct apiprefix_state
{
    apiprefix_opaque_type* things;
    int num_things;
};

apiprefix_state g_state = { nullptr, 0 };

This array is big enough that I care about about performance. Hence my constraint on avoiding unnecessary copies.

From Python, I want to be able to read the array, modify the array, or replace the array entirely. I think it makes more sense if whoever last set the array retained ownership over it, but I'm not entirely sure.

Here is my current attempt at exposing the array with NumPy.

void more_bindings(py::module_& m)
{
    py::class_(m, "state")
        .def(py::init([]() {
            return g_state;
        }))
        .def("create_things",
             [](apiprefix_state&, int size) -> py::array {
                 auto arr = py::array_t(size);
                 return std::move(arr);
             })
        .def_property(
            "things",
            [](apiprefix_state& state) {
                auto base = py::array_t();
                return py::array_t(state.num_things, state.things, base);
            },
            [](apiprefix_state& state, py::array_t things) {
                state.things = nullptr;
                state.num_things = 0;
                if (things.size() > 0)
                {
                    state.num_things = things.size();
                    state.things = (apiprefix_opaque_type*)things.request().ptr;
                }
            });
}

Given my rudimentary understanding of memory management in Python, I strongly suspect ownership is not properly implemented.

But the problem this question is about, is that NumPy doesn't understand what apiprefix_opaque_type is.

>>> state = apitest.state()
>>> state.things
Traceback (most recent call last):
  File "", line 1, in 
RuntimeError: NumPy type info missing for struct apiprefix_opaque_type
>>>

If I add a dtype declaration...

    PYBIND11_NUMPY_DTYPE(apiprefix_opaque_type, inner_value);

...now NumPy understands it, but there are now two incompatible Python types that refer to the same C type. Also, the implementation detail inner_value is exposed.

>>> state = apitest.state()
>>> state.things
array([], dtype=[('inner_value', '>> state.things = state.create_things(10)
>>> a = apitest.opaque_type()
>>> a

>>> state.things[0] = a
Traceback (most recent call last):
File "", line 1, in 
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'apitest.opaque_type'
>>>

How can I expose my array of opaque objects?

How to expose an array of opaque type with pybind11 and NumPy

TL;DR

The problem

Constraints

Implementation details

Answers (1)

Related Questions