How does the size of memory region argument in read() and write() affect the IO performance?

Question

There is an IO example from Advanced Programming in Unix Environment:

#include "apue.h"
#define BUFFSIZE 4096
int
main(void)
{
    int  n;
    char  buf[BUFFSIZE];
    while ((n = read(STDIN_FILENO, buf, BUFFSIZE)) > 0)
    if (write(STDOUT_FILENO, buf, n) != n)
    err_sys("write error");
    if (n < 0)
    err_sys("read error");
    exit(0);
}
All normal UNIX system shells provide a way to open a ﬁle for reading on standard input and to create (or rewrite) a ﬁle on standard output, and allows the user to take advantage of the shell’s I/O redirection facilities.

Figure 3.6 shows the results for reading a 516,581,760-byte ﬁle, using 20 different buffer sizes, with standard output redirected to /dev/null. The ﬁle system used for this test was the Linux ext4 ﬁle system with 4,096-byte blocks. (The st_blksize value is 4,096.) This accounts for the minimum in the system time occurring at the few timing measurements starting around a BUFFSIZE of 4,096. Increasing the buffer size beyond this limit has little positive effect.

How does BUFFSIZE affect the performance of reading a file?

As BUFFSIZE increases up to 4096, why does the performance improve? As BUFFSIZE increases above 4096, why does the performance have no significant improvement?
Does the kernel buffer (not the one buf with size BUFFSIZE in the program) help in the performance, in relation to BUFFSIZE?

When BUFFSIZE is small, does the kernel buffer help to accumulate the small writes, so to improve the performance?

Jeremy Friesner · Accepted Answer

Each call to read() and write() requires a system call (to communicate with the kernel), plus the time to do the actual copying of into to (or from) the kernel's memory space.

The system-call itself imposes a fixed (per-call) overhead/cost, while the cost of copying the data is of course proportional to the amount of data there is to copy.

Therefore, if you read()/write() very small buffers, the overhead of making the system call will be relatively high compared to the number of bytes of data copied; and since you'll have to make a large number of calls, the overall runtime will be longer than if you had done larger transfers.

Calling read()/write() a smaller number of times with larger buffers allows the system to amortize the overhead of the system call over a larger number of bytes-per-call, avoiding that inefficiency. However, at some point as sizes get larger, the system-call overhead becomes completely negligible, and at that point the program's efficiency is governed entirely by the cost of transferring the data, which is determined by the speed of the hardware. That's why you see performance leveling out as sizes get larger.

read() and write() do not accumulate small writes together, since they represent direct system calls. If you want small reads/writes to be buffered that way, the C runtime provides fread() and fwrite() wrappers that will do that for you inside your process-space.

How does the size of memory region argument in read() and write() affect the IO performance?

Answers (1)

Related Questions