How to implement Python arrays in a compiled C module using the Python C API

Question

I'm trying to send an array of doubles from a C Library to Python as efficiently as possible. The code to communicate with this Library was created by a different company (it includes many methods, exceptions, etc), but this particular function creates a list and inserts a Python object out of every item in the C array, which is very inefficient if you care about speed.

Here's a snippet of the C code compiled to create the python module:

static PyObject* foo(PyObject* self, PyObject* args) {

    double *val = 0;
    //more variables

    //Note that this uses the Python C API PyArg_ParseTuple to handle the parameters
    if (! PyArg_ParseTuple(args, "ii", &listID, &size)) {
        //send exception
    }

    //some code here that allocates an array to hold "val" and calls the C library

    PyList_New(size);
    for(i = 0; i < size; i++) {
        PyList_SET_ITEM(retData, i, Py_BuildValue("d", val[i]));
    }

    //free resources, return the Python object
}

What I've found is that a Python array might be useful, with the added benefit of being suitable for multi processes.

If Python Array works as I imagine, I can allocate the array in Python and then the C library just fills it

from cpython cimport array
import array
from dalibrary import dafunction

cdef array.array a = array.array('d', [])
array.resize(a, 1000)

dafunction(array, 1000)  #  In a very "C" style, the array would be filled with values

print(array)

The problem is that I don't find documentation on the C code needed to use a Python array. At least not using the Python C API.

Note: I am aware of ctypes, but that would mean rewriting the whole module, which I rather not, if possible (but the lack of documentation might drive me there)

It seems that somebody already asked a similar question here, but it remains unsolved

Result:

I managed to execute what I wanted (as you may see in one of the answers) and even use multithreading with the array (a multiprocess array) but to my surprise, it was actually a bit slower than using supposedly inefficient (but robust) methods of IPC, like a queue with a Python List.

Since using the Python API is difficult, and it gave me zero improvements, I think the best answer for the community is the suggestion to use ctypes. I'll keep my answer for reference. Maybe someone sending a huge piece of memory may benefit from it.

daragua · Accepted Answer

There is a fair amount of things to do to get your data from C to Python. First you should decide who handles the memory. Is it the C code that generated the array or is it Python? If the array is shared in many places and gets deleted on the C side without Python knowing, Python will crash. Or vice-versa.

So copying the array may not be a bad idea.

That being said, you could write a simple C function

struct Array {
   int size;
   int* data;
}

Array get_my_array() {
    //...
    return {size, val};
}

Compile that into a dynamic library (my_lib.so) and wrap it using Ctypes (its a standard Python library to access foreign functions). You would need to describe the Array return type:

from ctypes import Structure, POINTER, c_int, CDLL, find_library, pointer

class Array(Structure):
    __fields__ = [("size", c_int), ("data", POINTER(c_int))]


my_lib = CDLL(find_library("my_lib"))
my_lib.get_my_array.restype = Array

Now you you be able to get your array and access its data and size (and guard yourself manually from out-of-bound accesses).

You can also pass it to Numpy for example. Fortunately, there is a fairly complete example in the answer here How to create n-dim numpy array from a pointer? Read carefully, don't forget to clean the memory.

Note that you can do it the other way around. If you know in Python the size of the array to create and just need C code to populate it, you can create it in CTypes, and pass it to a C function that takes the pointer and the size.

ArrayType = c_int * size
array = ArrayType()

my_lib.populate_array(pointer(array), size)  # left as an exercise

Ctypes is very handy, and makes a lot of sense when you know your way around C.

How to implement Python arrays in a compiled C module using the Python C API

Result:

Answers (2)

Related Questions