tmoore82
tmoore82

Reputation: 1875

Is C read() Thread Safe?

I'm writing a program where multiple threads might read from a file simultaneously. No threads are writing to the file, but they might each copy its contents to separate memory segments.

To do implement this, I'm required to use an API that gives me a file descriptor for the file I want to read. I'm reading chunks of the file with C's read function. The man page says that, "On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number." I can't find any definitive information, though, on whether or not the advancement of the file position is thread safe.

Say I have thread T1 and thread T2 reading 1 byte of the file at a time. If read() is thread-safe, I would expect the following:

T1: read() => file position == 1
T2: read() => file position == 1
T1: read() => file position == 2
T2: read() => file position == 2
...

But I'm worried that if it's not thread-safe, then the following could happen:

T1: read() => file position == 1
T2: read() => file position == 2
T1: read() => file position == 3
T2: read() => file position == 4
...

If it helps, each thread would be using the same file descriptor. In other words, it's the API that opens the file using open(). The threads that are reading the file then fetch that file descriptor based on a client request. If each thread stores its own information about the file position, then it should be fine. I just can't find any information on what holds the file position, and where read() figures out what it is.

Upvotes: 4

Views: 7863

Answers (3)

David Schwartz
David Schwartz

Reputation: 182753

Your understanding of thread safety is incorrect or you are misapplying it.

T1: read() => file position == 1
T2: read() => file position == 1
T1: read() => file position == 2
T2: read() => file position == 2
...

Here, we have T1 and T2 invoking read at the same time and we get a result that could not possibly occur regardless of which order the read operations take place in. This is the canonical example of a race occurring when a function is not thread safe.

Roughly speaking, a function is thread safe if invoking it concurrently produces sane results, equivalent to those you get when you invoke it non-concurrently. If a single thread calling read never processes the same data twice, then read is thread safe if two threads calling it also never process the same data twice.

Upvotes: 0

Sinan Ünür
Sinan Ünür

Reputation: 118118

Note read:

The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified. (emphasis mine)

From the same reference:

ssize_t pread(int fildes, void *buf, size_t nbyte, off_t offset);

...

The pread() function shall be equivalent to read(), except that it shall read from a given position in the file without changing the file pointer. The first three arguments to pread() are the same as read() with the addition of a fourth argument offset for the desired position inside the file. An attempt to perform a pread() on a file that is incapable of seeking shall result in an error.

Each thread can keep track of its own offset, and specify it.

Upvotes: 3

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215193

read itself is thread-safe, but that doesn't necessarily mean the things you want to do with it are thread-safe. Per POSIX (2.9.7 Thread Interactions with Regular File Operations):

All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links:

...

(read is in the list that follows.)

Among other things, this means that reading the data and advancing the current file position are atomic with respect to each other, and each byte that's read will be read exactly once. However, there are other considerations that can complicate things for you, especially:

  • Short reads: read(fd, buf, n) need not read n bytes. It could read anywhere between 1 and n bytes, and when you call it again to read the remainder, that second read is no longer atomic with respect to the first one.

  • Other file types: POSIX only guarantees atomicity of read for regular files and perhaps a few other types. Specific systems like Linux probably have stronger guarantees, but I would be cautious.

It may be preferable to use the pread function (where you can specify a file offset to read from without having to seek to that position, and where the resulting file position remains unchanged) or perform locking around your file accesses to avoid such problems.

Upvotes: 5

Related Questions