First User
First User

Reputation: 761

Confused about buffered and unbuffered stdout/stderr in C and Python

I am confused about a couple of things when it comes to the issue of stdout and stderr being buffered/unbuffered:

1)

Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write() or print() functions) that I am working with ?

While programming in C, I have always gone by the rule that stdout is buffered while stderr is unbuffered. I have seen this in action by calling sleep() after putchar() statements within a while loop to see the individual characters being placed on stderr one by one, while only complete lines appeared in stdout. When I tried to replicate this program in python, both stderr and stdout had the same behaviour: produced complete lines - so I looked this up and found a post that said:

sys.stderr is line-buffered by default since Python 3.9.

Hence the question - because I was under the impression that the behaviour of stderr being buffered/unbuffered was decided and fixed by the OS but apparently, code libraries free to implement their own behaviour ? Can I hypothetically write a routine that writes to stdout without a buffer ?

The relevant code snippets for reference:

/* C */
while ((c = fgetc(file)) != EOF) {
    fputc(c, stdout /* or stderr */);
    usleep(800);
 }

# Python
for line in file:
    for ch in line:
        print(ch, end='', file=sys.stdout) # or sys.stderr
        time.sleep(0.08);

2)

Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks. But is writing to a device file like /dev/stdout and /dev/stdin the same as writing to disk? (Isn't disk supposed to be permanent? Stuff written to stdout or stderr only appears in the terminal, if connected, and then lost right?)

3)

Finally, is there really a need for stderr to be unbuffered in C if it is less efficient?

Upvotes: 3

Views: 1911

Answers (2)

John Bollinger
John Bollinger

Reputation: 180113

Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write() or print() functions) that I am working with ?

Mostly it is decided by the programming language implementation, and programming languages standardize this. For example, the C language specification says:

At program startup, three text streams are predefined and need not be opened explicitly — standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). As initially opened, the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.

(C2017, paragraph 7.21.3/7)

Similarly, the Python docs for sys.stdin, sys.stdout, and sys.stderr say:

When interactive, the stdout stream is line-buffered. Otherwise, it is block-buffered like regular text files. The stderr stream is line-buffered in both cases. You can make both streams unbuffered by passing the -u command-line option or setting the PYTHONUNBUFFERED environment variable.

Be aware, however, that both of those particular languages provide mechanisms to change the buffering of the standard streams (or in the Python case, at least stdout and stderr).

MOREOVER, the above is relevant only if you are using streams (C) or File objects (Python). In C, this is what all of the stdio functions use -- printf(), fgets(), fwrite(), etc. -- but it is not what (say) the POSIX raw I/O functions such as read() and write() use. If you use raw I/O interfaces such as the latter then there is only whatever buffering you perform manually.

Hence the question - because I was under the impression that the behaviour of stderr being buffered/unbuffered was decided and fixed by the OS

No. The OS (at least Unixes (including Mac) and Windows) does not perform I/O buffering on behalf of programs. Programming language implementations do, under some circumstances, and they are then in control of the details.

but apparently, code libraries free to implement their own behaviour ?

It's a bit more nuanced than that, but basically yes.

Can I hypothetically write a routine that writes to stdout without a buffer ?

Maybe. In C or Python, at least, you can exert some control over the buffering mode of the stdout stream. In C you can adjust it dynamically at runtime, but in Python I think the buffering mode is decided when Python starts.

You may also be able to bypass the buffer of a buffered stream by performing (raw) I/O on the underlying file descriptor, but this is extremely poor form, and depending on the details, it may produce undefined behavior.

Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks.

All I/O is slow, even I/O to a terminal. Disk I/O tends to be especially slow, but program performance generally benefits from buffering I/O to all devices.

But is writing to a device file like /dev/stdout and /dev/stdin the same as writing to disk?

Sometimes it is exactly writing to disk (look up I/O redirection). Different devices do have different performance characteristics, so buffering may improve performance more with some than with others, but again, all I/O is slow.

Finally, is there really a need for stderr to be unbuffered in C if it is less efficient?

The point of stderr being unbuffered (by default) in C is so that messages directed there are written to the underlying device (often a terminal) as soon as possible. Efficiency is not really a concern for the kinds of messages that this policy is most intended to serve.

Upvotes: 3

Kaia
Kaia

Reputation: 891

https://linux.die.net/man/3/stderr, https://linux.die.net/man/3/setbuf, and https://linux.die.net/man/2/write are helpful resources here

  • If you use the raw syscall write, there won't be buffering. I'd imagine the same is true for WinAPI but I don't know.
  • Python and C want to make it easier to write things, so they wrap the raw syscalls with a file pointer (in C)/file object (in python). This, in addition to storing the raw file descriptor used to make the syscalls, can optionally do things like buffer to reduce the amount of syscalls you're making.
  • You can change the buffering settings of a file or stream. (In C that's setbuf, I'm not sure for python.)
  • C and Python just happen to have different default configurations of stderr's wrapper.

For 2), writing to a pipe is usually much faster than writing to disk, but it's still a relatively slow operation compared to memcpy or the like, which is what buffering essentially is. The processor has to jump into kernel mode and back.

For 3), I'd guess that C developers decided it was more important to get errors on-time than to get performance. In general, if your program is spitting out lots of data to stderr you have bigger problems than performance.

Upvotes: 2

Related Questions