Reputation: 1334
Using pybind11, how can I expose an array of POD struct with NumPy, while also having them appear to the user as nice Python objects?
I am exposing a C style API to Python using pybind11. There are some types, implemented as simple POD struct in C, which would make more sense as opaque objects in Python. pybind11 allows me to do that and define how the object looks like in Python.
I also want to expose a dynamically allocated array of those. Doing so using pybind11 and NumPy is possible, but I haven't found a way that's compatible with how I've already exposed the type itself.
I end up with two different Python types, which are not compatible with each other, even though the underlying C type is the same.
I am looking for a solution that doesn't involve unnecessary copies. Since all the data are POD, I assume it should be possible to just reinterpret the data as structs on the C side or as opaque objects on the Python side.
The C API is fixed, but I have freedom over how I design the Python API.
On the C/C++ side, the type looks like this:
struct apiprefix_opaque_type
{
int inner_value;
};
Using pybind11, I'm exposing the structure as an opaque object. It's not critical that inner_value
is not exposed, but it simply doesn't have much value for the user, and it makes more sense to have a higher level type.
namespace py = pybind11;
void bindings(py::module_& m)
{
py::class_<apiprefix_opaque_type>(m, "opaque_type")
.def(py::init([]() {
apiprefix_opaque_type x;
x.inner_value = -1;
return x;
}))
.def("is_set", [](const apiprefix_opaque_type& x) -> bool { return x.inner_value != -1; });
m.def("create_some_opaque", []() -> apiprefix_opaque_type {
apiprefix_opaque_type x;
x.inner_value = 42;
return x;
});
}
With this in place, on the Python side I have the API behaviour I want.
>>> a = apitest.opaque_type()
>>> a.inner_value # Demonstrating that inner_value is not exposed.
AttributeError: 'apitest.opaque_type' object has no attribute 'inner_value'
>>> a.is_set()
False
>>> b = apitest.create_some_opaque()
>>> b.is_set()
True
Somewhere else in the API, I have a structure containing an array of these, as a pointer and count pair. For the sake of simplicity, let's pretend it's a global variable (even though in reality, it's a member of another dynamically allocated object).
struct apiprefix_state
{
apiprefix_opaque_type* things;
int num_things;
};
apiprefix_state g_state = { nullptr, 0 };
This array is big enough that I care about about performance. Hence my constraint on avoiding unnecessary copies.
From Python, I want to be able to read the array, modify the array, or replace the array entirely. I think it makes more sense if whoever last set the array retained ownership over it, but I'm not entirely sure.
Here is my current attempt at exposing the array with NumPy.
void more_bindings(py::module_& m)
{
py::class_<apiprefix_state>(m, "state")
.def(py::init([]() {
return g_state;
}))
.def("create_things",
[](apiprefix_state&, int size) -> py::array {
auto arr = py::array_t<apiprefix_opaque_type>(size);
return std::move(arr);
})
.def_property(
"things",
[](apiprefix_state& state) {
auto base = py::array_t<apiprefix_opaque_type>();
return py::array_t<apiprefix_opaque_type>(state.num_things, state.things, base);
},
[](apiprefix_state& state, py::array_t<apiprefix_opaque_type> things) {
state.things = nullptr;
state.num_things = 0;
if (things.size() > 0)
{
state.num_things = things.size();
state.things = (apiprefix_opaque_type*)things.request().ptr;
}
});
}
Given my rudimentary understanding of memory management in Python, I strongly suspect ownership is not properly implemented.
But the problem this question is about, is that NumPy doesn't understand what apiprefix_opaque_type
is.
>>> state = apitest.state()
>>> state.things
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: NumPy type info missing for struct apiprefix_opaque_type
>>>
If I add a dtype declaration...
PYBIND11_NUMPY_DTYPE(apiprefix_opaque_type, inner_value);
...now NumPy understands it, but there are now two incompatible Python types that refer to the same C type. Also, the implementation detail inner_value
is exposed.
>>> state = apitest.state()
>>> state.things
array([], dtype=[('inner_value', '<i4')])
>>> state.things = state.create_things(10)
>>> a = apitest.opaque_type()
>>> a
<apitest.opaque_type object at 0x000001BABE6E72B0>
>>> state.things[0] = a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'apitest.opaque_type'
>>>
How can I expose my array of opaque objects?
Upvotes: 6
Views: 1164
Reputation: 967
If you just want to expose the array of things then you could do something like this
apiprefix_opaque_type& apiprefix_state_get(apiprefix_state& s, size_t j)
{
return s.things[j];
}
void apiprefix_state_set(apiprefix_state& s, size_t j, const apiprefix_opaque_type& o)
{
s.things[j] = o;
}
py::class_<apiprefix_state>(m, "state")
// ...
.def("__getitem__", &apiprefix_state_get)
.def("__setitem__", &apiprefix_state_set)
Adding range checks would obviously be a good idea. (and you can use lambdas, I just find explicit functions a bit more readable).
When you wrap things
in a numpy array you are exposing it as a buffer, and the structured dtype just provides information about what bytes at what offsets should be interpreted as int
s. So you could actually write state.things[0] = 42
above (more generally for a struct with multiple members you could assign a tuple). But it does not know how to extract an int
from apiprefix_opaque_type
to assign it to the field defined by the dtype.
If you want to expose things
as a numpy array then as you've noted ownership is an important question. As implemented above, python will own any arrays created by create_things
and manage the underlying memory. However there are a couple of issues with your setter. First
state.things = nullptr;
state.num_things = 0;
is a potential memory leak if the memory pointed to by state.things
isn't managed by python. Secondly in this line
state.things = (apiprefix_opaque_type*)things.request().ptr;
you are referencing memory managed by python without reference counting, so there is a chance the apiprefix_state
will be left with things
pointing to memory that python has garbage collected.
It looks like you probably want to expose the global g_state
which is presumable managed by C++. In this case one possible method is
pybind11::capsule nogc(values, [](void *f) {});
return pybind11::array_t<apiprefix_opaque_type>(
{ g_state.num_things },
{ sizeof(apiprefix_opaque_type) },
g_state.things,
nogc
);
Alternatively you could use the buffer protocol directly or a memory view.
If you do want to always refer to a global state then it's unusual to return it from the initializer
.def(py::init([]() { return g_state; }))
normally it would be something like
.def_static("get_instance", ... )
but note this doesn't quite do what you want as it will copy g_state
.
Upvotes: 2