Reputation: 1048
I'm trying to send an array of doubles from a C Library to Python as efficiently as possible. The code to communicate with this Library was created by a different company (it includes many methods, exceptions, etc), but this particular function creates a list and inserts a Python object out of every item in the C array, which is very inefficient if you care about speed.
Here's a snippet of the C code compiled to create the python module:
static PyObject* foo(PyObject* self, PyObject* args) {
double *val = 0;
//more variables
//Note that this uses the Python C API PyArg_ParseTuple to handle the parameters
if (! PyArg_ParseTuple(args, "ii", &listID, &size)) {
//send exception
}
//some code here that allocates an array to hold "val" and calls the C library
PyList_New(size);
for(i = 0; i < size; i++) {
PyList_SET_ITEM(retData, i, Py_BuildValue("d", val[i]));
}
//free resources, return the Python object
}
What I've found is that a Python array might be useful, with the added benefit of being suitable for multi processes.
If Python Array works as I imagine, I can allocate the array in Python and then the C library just fills it
from cpython cimport array
import array
from dalibrary import dafunction
cdef array.array a = array.array('d', [])
array.resize(a, 1000)
dafunction(array, 1000) # In a very "C" style, the array would be filled with values
print(array)
The problem is that I don't find documentation on the C code needed to use a Python array. At least not using the Python C API.
Note: I am aware of ctypes, but that would mean rewriting the whole module, which I rather not, if possible (but the lack of documentation might drive me there)
It seems that somebody already asked a similar question here, but it remains unsolved
I managed to execute what I wanted (as you may see in one of the answers) and even use multithreading with the array (a multiprocess array) but to my surprise, it was actually a bit slower than using supposedly inefficient (but robust) methods of IPC, like a queue with a Python List.
Since using the Python API is difficult, and it gave me zero improvements, I think the best answer for the community is the suggestion to use ctypes. I'll keep my answer for reference. Maybe someone sending a huge piece of memory may benefit from it.
Upvotes: 1
Views: 956
Reputation: 1048
I found that indeed array.array (and Multiprocess.array) supports sharing the internal buffer, because both implement the buffer protocol, just as @user2357112-supports-Monica suggested. The trick, really, was understanding that I needed to use "y*" in PyArg_ParseTuple to import the parameters as a Py_buffer
This is an example of a C-function that receives the array as a Py_buffer and uses it to alter the data. The same buffer is shared between two processes in Python
C code:
//This function receives an object that implements
//the buffer protocol (such as array.array) and the size as an int.
static PyObject* duplicate(PyObject* self, PyObject* args) {
int size; //size could have being calculated with the buffer.len / sizeof(double)
double *data_ptr;
Py_buffer buffer;
//The second parameter indicates that we receive a py_buffer (y*) and and int (i)
if (! PyArg_ParseTuple(args, "y*i", &buffer, &size)) {
PyErr_SetString(PyExc_TypeError, "ERROR: Getting expression for method duplicate\n");
return (PyObject *) NULL;
}
data_ptr = (double*) buffer.buf;
for (int i=0; i < size; i++) {
data_ptr[i] *= 2.0;
}
return Py_BuildValue("");
};
Python code:
import testspeed
#import array
import multiprocessing
SIZE = 20
# Parallel processing
def my_func(i, shared_array):
print(f"Duplicating in process {i}")
testspeed.duplicate(shared_array, SIZE)
if __name__ == '__main__':
# Initialize an Array that can be shared between processes
multi_a = multiprocessing.Array('d', [i+1 for i in range(SIZE)])
shared_array = multi_a.get_obj()
print("array in Pyton")
for i in range(SIZE):
print(shared_array[i])
p1=multiprocessing.Process(target=my_func, args=(1, shared_array))
p2=multiprocessing.Process(target=my_func, args=(2, shared_array))
p1.start()
p2.start()
p1.join()
p2.join()
print("\n\narray after executing C code on each process")
for i in range(SIZE):
print(shared_array[i])
I'm still not sure if I'm missing a PyRelease somewhere. But that's something I can figure out later.
Thanks to @daragua and @user2357112-supports-Monica for their inputs.
Upvotes: 0
Reputation: 1183
There is a fair amount of things to do to get your data from C to Python. First you should decide who handles the memory. Is it the C code that generated the array or is it Python? If the array is shared in many places and gets deleted on the C side without Python knowing, Python will crash. Or vice-versa.
So copying the array may not be a bad idea.
That being said, you could write a simple C function
struct Array {
int size;
int* data;
}
Array get_my_array() {
//...
return {size, val};
}
Compile that into a dynamic library (my_lib.so
) and wrap it using Ctypes (its a standard Python library to access foreign functions).
You would need to describe the Array
return type:
from ctypes import Structure, POINTER, c_int, CDLL, find_library, pointer
class Array(Structure):
__fields__ = [("size", c_int), ("data", POINTER(c_int))]
my_lib = CDLL(find_library("my_lib"))
my_lib.get_my_array.restype = Array
Now you you be able to get your array and access its data and size (and guard yourself manually from out-of-bound accesses).
You can also pass it to Numpy for example. Fortunately, there is a fairly complete example in the answer here How to create n-dim numpy array from a pointer? Read carefully, don't forget to clean the memory.
Note that you can do it the other way around. If you know in Python the size of the array to create and just need C code to populate it, you can create it in CTypes, and pass it to a C function that takes the pointer and the size.
ArrayType = c_int * size
array = ArrayType()
my_lib.populate_array(pointer(array), size) # left as an exercise
Ctypes is very handy, and makes a lot of sense when you know your way around C.
Upvotes: 1