Reputation: 225

Amount of data read() syscall will actually read

Suppose I have a file for which the file descriptor has more than n bytes left until EOF, and I invoke the read() syscall for n bytes. Is the function guaranteed to read n bytes into the buffer? Or can it read less?

Upvotes: 2

Answers (3)

aafulei

Reputation: 2255

Is the function guaranteed to read n bytes into the buffer? Or can it read less?

No, even if your file has more than n bytes before its end, the read(fd, buf, n) function is not guaranteed to read n bytes into bufffer and then return n. It can read less and return a positive value that is less than n.

See Linux man page at https://man7.org/linux/man-pages/man2/read.2.html

RETURN VALUE

It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.

Upvotes: 1

Steve Summit

Reputation: 48133

The read system call is guaranteed to read as many many characters as you asked for, except when it can't. But it turns out that there are so many exceptions -- so many cases where it can't read as many characters as you asked for -- that it basically ends up being safest to assume that any given read call probably won't read as many characters as you asked for. I believe it's good practice to always write your code with that in mind.

The man page on my system says

The system guarantees to read the number of bytes requested if the descriptor references a normal file that has that many bytes left before the end-of-file, but in no other case.

So if it's not a normal file, or if it is a normal file but there aren't enough characters, you'll get fewer than you asked for. But in the case you asked about, yes, you should be guaranteed to get exactly as many characters as you asked for.

With that said, though, if you find yourself with a choice between assuming that read is allegedly guaranteed to read exactly the number of characters requested, versus acknowledging that it might return less, I would always write the code to assume it might return less. That is, if you have a call like

r = read(fd, buf, n);

there isn't usually much to be gained by assuming that if r is greater than 0, it must be exactly n. Your code has to be able to handle the r < n case so it will behave properly when it's almost at end-of-file, so unless you want to have two different code paths (one for "normal" reads, and one for the last read), you might as well write one piece of code, that can handle the r < n case, and let it operate all the time.

(Also, as Zan Lynx reminds in a comment, don't have the code notice that r < n, and infer from that that end-of-file is coming up soon. Wait for r == 0 before deciding you're at end-of-file.)

Upvotes: 6

Antti Haapala -- Слава Україні

Reputation: 134076

You could've read it from the man page yourself:

On Linux, read() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)

So even if you had enough RAM and so on, you couldn't read a full-size DVD image in one go - however, this wouldn't be the sane thing to do either; to access such large files, mmap would be better.

Other than that, a signal might be delivered, which can cause exit with EINTR and buffer contents indeterminate.

ERRORS

[...]

EINTR The call was interrupted by a signal before any data was read; see signal(7).

Upvotes: 2

Amount of data read() syscall will actually read

Answers (3)

Related Questions