Mark Gerolimatos
Mark Gerolimatos

Reputation: 2584

Boost: is there an interprocess::message_queue-like mechanism for thread-only communication?

The boost::interprocess::message_queue mechanism seems primarily designed for just that: interprocess communication.

The problem is that it serializes the objects in the message:

"A message queue just copies raw bytes between processes and does not send objects."

This makes it completely unsuitable for fast and repeated interthread communication with large composite objects being passed.

I want to create a message with a ref/shared_ptr/pointer to a known and previously-created object and safely pass it from one thread to the next.

You CAN use asio::io_service and post with bind completions, but that's rather klunky AND requires that the thread in question be using asio, which seems a bit odd.

I've already written my own, sadly based on asio::io_service, but would prefer to switch over to a boost-supported general mechansim.

Upvotes: 2

Views: 2204

Answers (2)

bazza
bazza

Reputation: 8434

Whilst I'm no expert in Boost per se, there is a fundamental difficulty in communicating between processes and threads via a pipe, message queue, etc, especially if it is assumed that a program's data is classes containing dynamically allocated memory (which is pretty much the case for things written with Boost; a string is not a simple object like it is in C...).

Copying of Data in Classes

Message queues and pipes are indeed just a way of passing a collection of bytes from one thread/process to another thread/process. Generally when you use them you're looking for the destination thread to end up with a copy of the original data, not just a copy of the references to the data (which would be pointing back at the original data).

With a simple C struct containing no pointers at all it's easy; a copy of the struct contains all the data, no problem. But a C++ class with complex data types like strings is now a structure containing references / pointers to allocated memory. Copy that structure and you haven't actually copied the data in the allocated memory.

That's where serialisation comes in. For interprocess communications where both processes can't ordinarily share the same memory serialisation serves as a way of parcelling up the structure to be sent plus all the data it refers to into a stream of bytes that can be unpacked at the other end. For threads it's no different if you don't want the two threads accessing the same memory at the same time. Serialisation is a convenient way of saving yourself having to navigating through a class to see exactly what needs to be copied.

Efficiency

I don't know what Boost uses for serialisation, but clearly serialising to XML would be painfully inefficient. A binary serialisation like ASN.1 BER would be much faster.

Also, copying data through pipes, message queues is no longer as inefficient as it used to be. Traditionally programmers don't do it because of the perceived waste of time spent copying the data repeatedly just to share it with another thread. With a single core machine that involves a lot of slow and wasteful memory accesses.

However, if one considers what "memory access" is in these days of QPI, Hypertransport, and so forth, it's not so very different to just copying the data in the first place. In both cases it involves data being sent over a serial bus from one core's memory controller to another core's cache.

Today's CPUs are really NUMA machines with memory access protocols layered on top of serial networks to fake an SMP environment. Programming in the style of copying messages through pipes, message queues, etc. is definitely edging towards saying that one is content with the idea of NUMA, and that really you don't need SMP at all.

Also, if you do all your inter-thread communications as message queues, they're not so very different to pipes, and pipes aren't so different to network sockets (at least that's the case on Not-Windows). So if you write your code carefully you can end up with a program that can be redeployed across a distributed network of computers or across a number of threads within a single process. That's a nice way of getting scalability because you're not changing the shape or feel of your program in any significant way when you scale up.

Fringe Benefits

Depending on the serialisation technology used there can be some fringe benefits. With ASN.1 you specify a message schema in which you set out the valid ranges of the message's contents. You can say, for example, that a message contains an integer, and it can have values between 0 and 10. The encoders and decoders generated by decent ASN.1 tools will automatically check that the data you're sending or receiving meets that constraint, and returns errors if not.

I would be surprised if other serialisers like Google Protocol Buffers didn't do a similar constraints check for you.

The benefit is that if you have a bug in your program and you try and send an out of spec message, the serialiser will automatically spot that for you. That can save a ton of time in debugging. Also it is something you definitely don't get if you share a memory buffer and protect it with a semaphore instead of using a message queue.

CSP

Communicating Sequential Processes and the Actor model are based on sending copies of data through message queues, pipes, etc. just like you're doing. CSP in particular is worth paying attention to because it's a good way of avoiding a lot of the pitfalls of multi-threaded software that can lurk undetected in source code.

There are some CSP implementations you can just use. There's JCSP, a class library for Java, and C++CSP, built on top of Boost to do CSP for C++. They're both from the University of Kent.

C++CSP looks quite interesting. It has a template class called csp::mobile, which is kind of like a Boost smart pointer. If you send one of these from one thread to another via a channel (CSP's word for a message queue) you're sending the reference, not the data. However, the template records which thread 'owns' the data. So a thread receiving a mobile now owns the data (which hasn't actually moved), and the thread that sent it can no longer access it. So you get the benefits of CSP without the overhead of copying the data.

It also looks like C++CSP is able to do channels over TCP; that's a very attractive feature, up scaling is a really simple possibility. JCSP works over network connections too.

Upvotes: 3

Slava
Slava

Reputation: 44268

You need a mechanism, that designed for interprocess communication because separate processes has separate address space and you cannot simply pass pointers except very spacial cases. For thread communication you can use standard containers like std::stack, std::queue and std::priority_queue to communicate between threads, you just need to provide proper synchronization through mutexes. Or you can use lock-free containers, which also provided by boost. What else would you need for interthread communication?

Upvotes: 2

Related Questions