Reputation: 1875
I'm writing a program where multiple threads might read from a file simultaneously. No threads are writing to the file, but they might each copy its contents to separate memory segments.
To do implement this, I'm required to use an API that gives me a file descriptor for the file I want to read. I'm reading chunks of the file with C's read
function. The man page says that, "On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number." I can't find any definitive information, though, on whether or not the advancement of the file position is thread safe.
Say I have thread T1 and thread T2 reading 1 byte of the file at a time. If read()
is thread-safe, I would expect the following:
T1: read() => file position == 1
T2: read() => file position == 1
T1: read() => file position == 2
T2: read() => file position == 2
...
But I'm worried that if it's not thread-safe, then the following could happen:
T1: read() => file position == 1
T2: read() => file position == 2
T1: read() => file position == 3
T2: read() => file position == 4
...
If it helps, each thread would be using the same file descriptor. In other words, it's the API that opens the file using open()
. The threads that are reading the file then fetch that file descriptor based on a client request. If each thread stores its own information about the file position, then it should be fine. I just can't find any information on what holds the file position, and where read()
figures out what it is.
Upvotes: 4
Views: 7863
Reputation: 182753
Your understanding of thread safety is incorrect or you are misapplying it.
T1: read() => file position == 1
T2: read() => file position == 1
T1: read() => file position == 2
T2: read() => file position == 2
...
Here, we have T1 and T2 invoking read
at the same time and we get a result that could not possibly occur regardless of which order the read
operations take place in. This is the canonical example of a race occurring when a function is not thread safe.
Roughly speaking, a function is thread safe if invoking it concurrently produces sane results, equivalent to those you get when you invoke it non-concurrently. If a single thread calling read
never processes the same data twice, then read
is thread safe if two threads calling it also never process the same data twice.
Upvotes: 0
Reputation: 118118
Note read:
The
read()
function shall attempt to readnbyte
bytes from the file associated with the open file descriptor,fildes
, into the buffer pointed to bybuf
. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified. (emphasis mine)
From the same reference:
ssize_t pread(int fildes, void *buf, size_t nbyte, off_t offset);
...
The
pread()
function shall be equivalent toread()
, except that it shall read from a given position in the file without changing the file pointer. The first three arguments topread()
are the same asread()
with the addition of a fourth argument offset for the desired position inside the file. An attempt to perform apread()
on a file that is incapable of seeking shall result in an error.
Each thread can keep track of its own offset
, and specify it.
Upvotes: 3
Reputation: 215193
read
itself is thread-safe, but that doesn't necessarily mean the things you want to do with it are thread-safe. Per POSIX (2.9.7 Thread Interactions with Regular File Operations):
All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links:
...
(read
is in the list that follows.)
Among other things, this means that reading the data and advancing the current file position are atomic with respect to each other, and each byte that's read will be read exactly once. However, there are other considerations that can complicate things for you, especially:
Short reads: read(fd, buf, n)
need not read n
bytes. It could read anywhere between 1 and n
bytes, and when you call it again to read the remainder, that second read is no longer atomic with respect to the first one.
Other file types: POSIX only guarantees atomicity of read
for regular files and perhaps a few other types. Specific systems like Linux probably have stronger guarantees, but I would be cautious.
It may be preferable to use the pread
function (where you can specify a file offset to read from without having to seek to that position, and where the resulting file position remains unchanged) or perform locking around your file accesses to avoid such problems.
Upvotes: 5