poukill
poukill

Reputation: 610

Fast communication between C++ and python using shared memory

In a cross platform (Linux and windows) real-time application, I need the fastest way to share data between a C++ process and a python application that I both manage. I currently use sockets but it's too slow when using high-bandwith data (4K images at 30 fps).

I would ultimately want to use the multiprocessing shared memory but my first tries suggest it does not work. I create the shared memory in C++ using Boost.Interprocess and try to read it in python like this:

#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>

int main(int argc, char* argv[])
{
    using namespace boost::interprocess;

    //Remove shared memory on construction and destruction
    struct shm_remove
    {
        shm_remove() { shared_memory_object::remove("myshm"); }
        ~shm_remove() { shared_memory_object::remove("myshm"); }
    } remover;

    //Create a shared memory object.
    shared_memory_object shm(create_only, "myshm", read_write);

    //Set size
    shm.truncate(1000);

    //Map the whole shared memory in this process
    mapped_region region(shm, read_write);

    //Write all the memory to 1
    std::memset(region.get_address(), 1, region.get_size());

    std::system("pause");
}

And my python code:

from multiprocessing import shared_memory

if __name__ == "__main__":
    shm_a = shared_memory.SharedMemory(name="myshm", create=False)
    buffer = shm_a.buf
    print(buffer[0])

I get a system error FileNotFoundError: [WinError 2] : File not found. So I guess it only works internally in Python multiprocessing, right ? Python seems not to find the shared memory created on C++ side.

Another possibility would be to use mmap but I'm afraid that's not as fast as "pure" shared memory (without using the filesystem). As stated by the Boost.interprocess documentation:

However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory

I don't know to what extent it is slower however. I just would prefer the fastest solution as this is the bottleneck of my application for now.

Upvotes: 4

Views: 5312

Answers (3)

CaptXan
CaptXan

Reputation: 49

For future viewers, I fixed this error by using windows_shared_memory instead of shared_memory_object.

Upvotes: 1

KRG
KRG

Reputation: 956

An example of communication between C++ and python, using shared memory and memory mapping can be found in https://stackoverflow.com/a/69806149/2625176 .

Upvotes: 2

poukill
poukill

Reputation: 610

So I spent the last days implementing shared memory using mmap, and the results are quite good in my opinion. Here are the benchmarks results comparing my two implementations: pure TCP and mix of TCP and shared memory.

Protocol:

Benchmark consists of moving data from C++ to Python world (using python's numpy.nparray), then data sent back to C++ process. No further processing is involved, only serialization, deserialization and inter-process communication (IPC).

Case A:

Communication is done with TCP {header + data}.

Case B:

  • One C++ process implementing TCP communication using Boost.Asio and shared memory (mmap) using Boost.Interprocess
  • One Python3 process using standard TCP sockets and mmap

Communication is hybrid : synchronization is done through sockets (only header is passed) and data is moved through shared memory. I think this design is great because I have suffered in the past from problem of synchronization using condition variable in shared memory, and TCP is easy to use in both C++ and Python environments.

Results:

Small data at high frequency

200 MBytes/s total: 10 MByte sample at 20 samples per second

Case Global CPU consumption C++ part python part
A 17.5 % 10% 7.5%
B 6% 1% 5%

Big data at low frequency

200 MBytes/s total: 0.2 MByte sample at 1000 samples per second

Case Global CPU consumption C++ part python part
A 13.5 % 6.7% 6.8%
B 11% 5.5% 5.5%

Max bandwidth

  • A : 250 MBytes / second
  • B : 600 MBytes / second

Conclusion:

In my application, using mmap has a huge impact on big data at average frequency (almost 300 % performance gain). When using very high frequencies and small data, the benefit of shared memory is still there but not that impressive (only 20% improvement). Maximum throughput is more than 2 times bigger.

Using mmap is a good upgrade for me. I just wanted to share my results here.

Upvotes: 3

Related Questions