Reputation: 578
Here is what man writev
says:
The data transfers performed by readv() and writev() are atomic: the data written by writev() is written as a single block that is not intermingled with output from writes in other processes (but see pipe(7) for an exception); analogously, readv() is guaranteed
This is from man 7 pipe
:
O_NONBLOCK disabled, n <= PIPE_BUF All n bytes are written atomically; write(2) may block if there is not room for n bytes to be written immediately O_NONBLOCK enabled, n <= PIPE_BUF If there is room to write n bytes to the pipe, then write(2) succeeds immediately, writing all n bytes; otherwise write(2) fails, with errno set to EAGAIN. O_NONBLOCK disabled, n > PIPE_BUF The write is nonatomic: the data given to write(2) may be interleaved with write(2)s by other process; the write(2) blocks until n bytes have been written. O_NONBLOCK enabled, n > PIPE_BUF If the pipe is full, then write(2) fails, with errno set to EAGAIN. Otherwise, from 1 to n bytes may be written (i.e., a "partial write" may occur; the caller should check the return value from write(2) to see how many bytes were actually written), and these bytes may be interleaved with writes by other processes.
$ cat writev.c
#include <string.h>
#include <sys/uio.h>
int
main(int argc,char **argv) {
static char part1[] = "ST";
static char part2[] = "\n";
struct iovec iov[2];
iov[0].iov_base = part1;
iov[0].iov_len = strlen(part1);
iov[1].iov_base = part2;
iov[1].iov_len = strlen(part2);
writev(1,iov,2);
return 0;
}
$ gcc writev.c
$ unbuffer bash -c 'for ((i=0; i<50; i++)); do ./a.out & ./a.out; done' | wc -c
300 # < PIPE_BUF
# Run the following several times to get the output corrupted
$ unbuffer bash -c 'for ((i=0; i<50; i++)); do ./a.out & ./a.out; done' | sort | uniq -c
4
92 ST
4 STST
If writev is atomic (according to documentation) can anybody explain why the outputs of different writes are interleaved?
Update:
Some relevant data from strace -fo /tmp/log unbuffer bash -c 'for ((i=0; i<10000; i++)); do ./a.out & ./a.out; done' | sort | uniq -c
13301 writev(1, [{iov_base="ST", iov_len=2}, {iov_base="\n", iov_len=1}], 2 <unfinished ...>
13302 mprotect(0x56397d7d8000, 4096, PROT_READ) = 0
13302 mprotect(0x7f7190c68000, 4096, PROT_READ) = 0
13302 munmap(0x7f7190c51000, 90695) = 0
13302 writev(1, [{iov_base="ST", iov_len=2}, {iov_base="\n", iov_len=1}], 2) = 3
13301 <... writev resumed> ) = 3
24814 <... select resumed> ) = 1 (in [4])
13302 exit_group(0 <unfinished ...>
13301 exit_group(0 <unfinished ...>
13302 <... exit_group resumed>) = ?
13301 <... exit_group resumed>) = ?
24814 futex(0x55b5b8c11cc4, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...>
24807 <... futex resumed> ) = 0
24814 <... futex resumed> ) = 1
24807 futex(0x7f7f55e8f920, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
13302 +++ exited with 0 +++
24807 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
13301 +++ exited with 0 +++
24807 futex(0x7f7f55e8f920, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
24814 futex(0x7f7f55e8f920, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
24807 <... futex resumed> ) = 0
24814 <... futex resumed> ) = 0
24807 read(4, <unfinished ...>
24814 select(6, [5], [], [], NULL <unfinished ...>
24807 <... read resumed> "STST\n\n", 4096) = 6
24808 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13302
24807 write(1, "STST\n\n", 6 <unfinished ...>
Upvotes: 3
Views: 1188
Reputation: 215259
As specified, yes for pipes, when the total iov
length does not exceed PIPE_BUF
, because:
The writev() function shall be equivalent to write(), except as described below
with no exceptions made for pipes (the word pipe does not even appear in the writev
specification).
In practice for Linux, maybe not. writev
equivalence to a single write
only works on kernel file types that implement the "new" (as of 15 years ago or so) iov
-based read/write backends. Some, like terminals, only implement the old interfaces that use a single buffer, and Linux emulates the writev
(or readv
) as multiple write
calls (or resp. read
calls). The readv
case is also problematic, as you can see in this commit to musl libc.
I'm not sure whether pipes are affected by this issue or not. You'd have to dig into the kernel sources.
Upvotes: 2