This is a follow-up to process hangs when writing large data to pipe where we identified problems in how I was using pipes. There was a helpful discussion in the comments but I have more questions, so posing them here. If you must assume a programming language assume Perl since that's what I'm using (so I'll keep the tag). I don't know how much of this varies by language... What is the definition of "flush" exactly? Before flushing, is the data in the pipe yet, or is the data only in the pipe after it has been flushed? 1a. If it is not in the pipe yet, where is it? 1b. If it is in the pipe, how is the reader prevented from reading it? What is the reason/motivation for having the concept of flushing? From a comments on the earlier post: "A pipe is 'block buffered' so stuff is there to read only once a block (4kB?) has been written or pipe got full (64kB?)" So is a pipe automatically flushed once it is full? What if you want to write a single variable to a pipe which is larger than the pipe's entire size? Assuming you also have a process actively reading from the pipe, will the process be smart enough to write the variable in chunks, or will it just freeze because it can't put the variable all in the pipe at once?

Reputation: 8820

several questions about pipes

This is a follow-up to process hangs when writing large data to pipe where we identified problems in how I was using pipes. There was a helpful discussion in the comments but I have more questions, so posing them here.

If you must assume a programming language assume Perl since that's what I'm using (so I'll keep the tag). I don't know how much of this varies by language...

What is the definition of "flush" exactly? Before flushing, is the data in the pipe yet, or is the data only in the pipe after it has been flushed?

1a. If it is not in the pipe yet, where is it?

1b. If it is in the pipe, how is the reader prevented from reading it?
What is the reason/motivation for having the concept of flushing?
From a comments on the earlier post: "A pipe is 'block buffered' so stuff is there to read only once a block (4kB?) has been written or pipe got full (64kB?)" So is a pipe automatically flushed once it is full?
What if you want to write a single variable to a pipe which is larger than the pipe's entire size? Assuming you also have a process actively reading from the pipe, will the process be smart enough to write the variable in chunks, or will it just freeze because it can't put the variable all in the pipe at once?

Upvotes: 1

Answers (1)

mob

Reputation: 118635

Let's start with #2. Buffering and flushing of output handles is about efficiency. There is some overhead in a disk or a socket write operation, and in general it is more efficient to do write 1000 bytes in a single operation than to write 4 bytes in 250 separate operations. For programs that are heavy on I/O but light on computation, this can make a huge difference. So I/O libraries maintain a memory "buffer" with a size that has been chosen for optimal writing efficiency. Let's assume that it is 4 8KB.

In normal, buffered operation, your process writes some bytes to the output handle. The I/O library copies these bytes to the "buffer" until the buffer is full. When the buffer is full, the I/O library performs the actual write operation to the pipe/socket/disk of the entire buffer. Then it erases the buffer and waits for more output.

(Input buffering is a thing, too. Like you might ask for 23 bytes from an input handle, and the I/O library might read 8KB of data from the input channel, return 23 bytes to you, and put the rest into memory for the next read requests you make)

So now we will address #1. An obvious drawback of buffering is that bytes that have been ostensibly written to a file/pipe/socket might only exist in a buffer and so will not be available to a separate process that is reading the same data source, and you experience suffering from buffering. The amount of data that is not available to a reader can be as large as the size of the buffer on the output handle.

A "flush" operation on an output handle tells the I/O library to write the current buffer to disk/pipe/socket, even if the buffer is not full. And this does make the data available for a separate reader.

In Perl, output handles can be "autoflushed", meaning that the flush operation will be performed after every write or print on the handle. With autoflushing, you lose the efficiency gains from I/O buffering but you get better responsiveness from your output generator. Most handles are not autoflushed by default, so you have to enable it yourself with a call like

WRITER->autoflush(1)

after you create the output handle. Without autoflush, output is written only when the output buffer is full or the output handle is closed.

#3. Pipes and sockets also have a finite capacity, which has nothing to do with the size of the I/O buffer we've been talking about (Files also have a finite capacity, since you will eventually run out of disk space). When you have written enough output to a pipe or socket so that it is full, your write operation will block. The pipe and socket can get emptied when other processes read from them. When there is enough capacity to contain the contents of your write operation, the operation will continue. In principle, you could write on the output handle up to the pipe's capacity (let's say 64KB) plus up to the size of the output handle's buffer (~8KB) before your operation would block.

#4. So what will happen if you attempt a single very large write to a pipe? It depends, but there is no guarantee that it will be anything good. So it is best to take care not do that. For use cases where the amount of interprocess data is large relative to pipe capacity, consider using a file instead.

Upvotes: 3

several questions about pipes

Answers (1)

Related Questions