Reputation: 366
I am working on a program that uses an external C library to parse data from external sources and a Python library to run some optimisation problem on it. The optimisation is very time consuming so using several CPU would be a significant plus.
Basically, I wrapped the C(++) structures with Cython as follows:
cdef class CObject(object):
cdef long p_sthg
cdef OBJECT* sthg
def __cinit__(self, sthg):
self.p_sthg = sthg
self.sthg = <OBJECT*> self.p_sthg
def __reduce__(self):
return (rebuildObject, (self.p_sthg, ))
def getManyThings(self):
...
return blahblahblah
Then I create my resource intensive process:
p = mp.Process(target=make_process, args=((cobject,)))
As you can immediately guess (of course I didn't), even though I manage to unpickle the CObject, the pointer is passed to the new process, but not the C structure it refers to.
I can find some resources explaining how to put Python objects into shared memory, but that would not be sufficient in my case, as I would need to share C objects I barely know about (and other objects that are pointed at by the top CObject) between the Python processes.
In case it matters, the good thing is that I can survive with a read-only access...
Does anyone have any experience in such matter?
My other idea would be to find a way to write the binary representation of the object I need to pass into file and read it from the other process...
Upvotes: 4
Views: 1050
Reputation: 30161
There's no single, general way to do this.
You can put the C object into shared memory by constructing it inside a suitable mmap(2)
region (also available via mmap
in the Python standard library; use MAP_SHARED|MAP_ANONYMOUS
). This requires the entire object to lie within the mmap, and will likely make it impossible for the object to use pointers (but offsets relative to the object are probably OK provided they point within the mmap). If the object has any file descriptors or other handles of any kind, those will almost certainly not work correctly. Note that mmap()
is like malloc()
; you have to do a corresponding munmap()
or you leak the memory.
You could copy the C object into shared memory (with e.g. memcpy(3)
). This is likely less efficient, and requires the object to be reasonably copiable. memcpy
does not magically fix up pointers and other references. On the plus side, this does not require you to control the object's construction.
You can serialize the object to some binary representation and pass it through a pipe(2)
(also available via os.pipe()
in Python). For simple cases, this is a field-by-field copy, but again, pointers will need care. You will have to (un)swizzle your pointers to make them work correctly after (de)serialization. This is the most easily generalized technique, but requires knowledge of how the object is structured, or a black-box function that does the serialization for you.
Finally, you can create temporary files in /dev/shm
and exchange information that way. These files are backed by RAM and are effectively the same as shared memory, but with perhaps a more familiar file-like interface. But this is Unix-only. On systems other than Linux, you should use shm_open(3)
for full portability.
Note that shared memory, in general, tends to be problematic. It requires inter-process synchronization, but the necessary locking primitives are far less developed than in the threading world. I recommend limiting shared memory to immutable objects or inherently lock-free designs (which are quite difficult to get right).
Upvotes: 4