Reputation: 761
I am confused about a couple of things when it comes to the issue of stdout
and stderr
being buffered/unbuffered:
Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write()
or print()
functions) that I am working with ?
While programming in C, I have always gone by the rule that stdout
is buffered while stderr
is unbuffered. I have seen this in action by calling sleep()
after putchar()
statements within a while
loop to see the individual characters being placed on stderr
one by one, while only complete lines appeared in stdout
. When I tried to replicate this program in python, both stderr
and stdout
had the same behaviour: produced complete lines - so I looked this up and found a post that said:
sys.stderr is line-buffered by default since Python 3.9.
Hence the question - because I was under the impression that the behaviour of stderr
being buffered/unbuffered was decided and fixed by the OS but apparently, code libraries free to implement their own behaviour ? Can I hypothetically write a routine that writes to stdout without a buffer ?
The relevant code snippets for reference:
/* C */
while ((c = fgetc(file)) != EOF) {
fputc(c, stdout /* or stderr */);
usleep(800);
}
# Python
for line in file:
for ch in line:
print(ch, end='', file=sys.stdout) # or sys.stderr
time.sleep(0.08);
Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks. But is writing to a device file like /dev/stdout
and /dev/stdin
the same as writing to disk? (Isn't disk supposed to be permanent? Stuff written to stdout
or stderr
only appears in the terminal, if connected, and then lost right?)
Finally, is there really a need for stderr
to be unbuffered in C if it is less efficient?
Upvotes: 3
Views: 1911
Reputation: 180113
Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write() or print() functions) that I am working with ?
Mostly it is decided by the programming language implementation, and programming languages standardize this. For example, the C language specification says:
At program startup, three text streams are predefined and need not be opened explicitly — standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). As initially opened, the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
(C2017, paragraph 7.21.3/7)
Similarly, the Python docs for sys.stdin
, sys.stdout
, and sys.stderr
say:
When interactive, the
stdout
stream is line-buffered. Otherwise, it is block-buffered like regular text files. Thestderr
stream is line-buffered in both cases. You can make both streams unbuffered by passing the -u command-line option or setting thePYTHONUNBUFFERED
environment variable.
Be aware, however, that both of those particular languages provide mechanisms to change the buffering of the standard streams (or in the Python case, at least stdout
and stderr
).
MOREOVER, the above is relevant only if you are using streams (C) or File objects (Python). In C, this is what all of the stdio functions use -- printf()
, fgets()
, fwrite()
, etc. -- but it is not what (say) the POSIX raw I/O functions such as read()
and write()
use. If you use raw I/O interfaces such as the latter then there is only whatever buffering you perform manually.
Hence the question - because I was under the impression that the behaviour of stderr being buffered/unbuffered was decided and fixed by the OS
No. The OS (at least Unixes (including Mac) and Windows) does not perform I/O buffering on behalf of programs. Programming language implementations do, under some circumstances, and they are then in control of the details.
but apparently, code libraries free to implement their own behaviour ?
It's a bit more nuanced than that, but basically yes.
Can I hypothetically write a routine that writes to stdout without a buffer ?
Maybe. In C or Python, at least, you can exert some control over the buffering mode of the stdout stream. In C you can adjust it dynamically at runtime, but in Python I think the buffering mode is decided when Python starts.
You may also be able to bypass the buffer of a buffered stream by performing (raw) I/O on the underlying file descriptor, but this is extremely poor form, and depending on the details, it may produce undefined behavior.
Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks.
All I/O is slow, even I/O to a terminal. Disk I/O tends to be especially slow, but program performance generally benefits from buffering I/O to all devices.
But is writing to a device file like /dev/stdout and /dev/stdin the same as writing to disk?
Sometimes it is exactly writing to disk (look up I/O redirection). Different devices do have different performance characteristics, so buffering may improve performance more with some than with others, but again, all I/O is slow.
Finally, is there really a need for
stderr
to be unbuffered in C if it is less efficient?
The point of stderr
being unbuffered (by default) in C is so that messages directed there are written to the underlying device (often a terminal) as soon as possible. Efficiency is not really a concern for the kinds of messages that this policy is most intended to serve.
Upvotes: 3
Reputation: 891
https://linux.die.net/man/3/stderr, https://linux.die.net/man/3/setbuf, and https://linux.die.net/man/2/write are helpful resources here
write
, there won't be buffering. I'd imagine the same is true for WinAPI but I don't know.setbuf
, I'm not sure for python.)For 2), writing to a pipe is usually much faster than writing to disk, but it's still a relatively slow operation compared to memcpy
or the like, which is what buffering essentially is. The processor has to jump into kernel mode and back.
For 3), I'd guess that C developers decided it was more important to get errors on-time than to get performance. In general, if your program is spitting out lots of data to stderr
you have bigger problems than performance.
Upvotes: 2