Reputation: 455
I have a constant flow of medium sized ndarray
s (each around 10-15mb in memory) on which I use ndarray.tobytes()
before I send it to the next part of the pipeline.
Currently it takes about 70-100ms per array serialization.
I was wondering, is this the fastest that this could be done or is there a faster (maybe not as pretty) way to accomplish that?
clarification: arrays are images, next step in pipeline is some CPP function, I don't want to save them as a file.
Upvotes: 1
Views: 1581
Reputation: 249293
There is no need to serialize them at all! You can let C++ read the memory directly. One way is to invoke a C++ function with the PyObject which is your NumPy array. Another is to let C++ allocate the NumPy array in the first place and populate the elements in Python before returning control to C++, for which I have some open source code built atop Boost Python that you can use: https://github.com/jzwinck/pccl/blob/master/NumPyArray.hpp
Your goal should be "zero copy" meaning you never copy the bytes of the array, you only copy references to the array or data within it plus the dimensions.
Upvotes: 3