Tom Seddon
Tom Seddon

Reputation: 2748

POSIX POLLOUT and blocking file descriptors

According to the POSIX poll docs, the POLLOUT flag indicates that "Normal data may be written without blocking.". But how much data? Are there any fixed guarantees, or is there any way I can find out how much buffer space there is?

I'm writing to pipes and sockets. Generally, my program will not get EINTR, since all expected signals are handled via signalfd/kqueue; as far as I can tell, that means I can expect write always to block until all of the requested data has been transferred.

I'm not sure I want this. Ideally, I'd like to just write as much data as there's space in the buffer for, so that the code can get back to the poll loop ASAP without blocking. (If there's then data still left to write, my code can check for POLLOUT on the same FD again, and do the same thing again when space becomes available.)

Is there any way to make this work well? poll seems to work nicely for reading from blocking pipe/socket FDs: when the POLLIN flag is set, you read into a buffer, and you get as much as is available without blocking. But it's starting to look a bit inconvenient for writing! Am I missing something?

Or is this all just trying to tell me to use O_NONBLOCK?

I'm working on OS X and Linux.

Upvotes: 2

Views: 871

Answers (2)

JvO
JvO

Reputation: 3106

The only guarantee is that you can write at least 1 byte; however, usually that will be more since most OSes are smart to avoid the silly window syndrome.

I'm not sure I want this. Ideally, I'd like to just write as much data as there's space in the buffer for, so that the code can get back to the poll loop ASAP without blocking.

Then use O_NONBLOCK. The write() to a socket will return the number of bytes written to the internal buffer. If you try to write 10,000 bytes and write() returns 2,000 you still have 8,000 bytes left. It requires a bit more internal housekeeping (well, one pointer and one integer) but it is the most efficient way to keep the kernel buffer full.

Note that if you are done writing, clear the bit from the events flags or your poll() call will keep returning with "write more data!".

Upvotes: 1

bazza
bazza

Reputation: 8414

On the face of it the most you are guaranteed to be able to write is 1 byte.

Switching to non-blocking mode doesn't help improve things much either. Sure, you don't block but you then have to work out what to do with any data that didn't get written by the non-blocking write. poll() gives you the same results irrespective of a file descriptor's blocking mode.

You might try a couple of alternatives, asynchronous IO or a framework such as ZeroMQ.

Async IO

There's asynchronous IO, which effectively boils down to having a bunch of background threads do the writing for you. That's fine, but you then have to manage the data lifetime / ownership problem. You have to leave whatever is being written in place and untouched until you're told the asynchronous write has finished. This can be fiddly to say the least.

Framework

Another option is a message framework such as ZeroMQ. There's others - nanomsg (written by the same guy who did ZeroMQ), DDS, Corba (if you dare), TASTE from ESA, etc.

Drop In Replacement

These all have different qualities, but ZeroMQ and nanomsg in particular are designed to be the least painful way of converting code from using standard socket and pipe APIs to using their own (nanomsg especially).

Messages, not Streams

All of these are message orientated, not stream orientated, and they all do a good job of managing the painful business of passing data across some sort of connection for you, and sort out the data lifetime problem too.

Underlying Transports

ZeroMQ and nanomsg will work across pipes, sockets, shared memory, etc. They can also incorporate ordinary file descriptors too in their equivalents of poll() for when your code is waiting for input (useful if you have, say, an fd for a serial port to listen to as well).

Comparison with async IO

The difference between such frameworks and async io is as follows. With a framework, the software at the other end of a connection has to use the same framework too otherwise they are unable to communicate. With aio, the other end can still use ordinary synchronous socket function calls because fundamentally its all still just a stream connection.

The frameworks are all more opaque that aio. Who knows how much data copying is going on inside the frameworks? This generally doesn't matter, the assumption being that your RAM is far quicker than your network.

0MQ/nanomsg vs the rest

The difference between ZeroMQ/nanomsg and things like DDS/Corba/TASTE is that the latter also incorporate serialisation. They allow you to specify message structures in a platform independent description language, which gets 'compiled' to whatever source code language you're using (C, C#, JAVA, etc). It's very natural, a bit like declaring C structs. This allows you to have a completely different platform and programming language at the other end of a connection and they will still talk. DDS and Corba use IDL for this, TASTE uses ASN.1 (which is superior in every possible way) but TASTE is a bit difficult to get going with.

You can achieve the same effect with ZeroMQ/nanomsg by using, say, Google protocol buffers or ASN.1 (or any of the many available ) to serialise data and ZeroMQ/nanomsg to transport the byte streams generated by the serialiser as messages.

Upvotes: 1

Related Questions