Reputation: 113
For instance, open a file twice, direct-io writes with one fd, and page cache reads with the other?
How to define safe
: Write some data from direct-io fd and then expect to read them immediately
from page-cache fd
Upvotes: 1
Views: 2126
Reputation: 11
I want to add that while it's indeed safe to mix direct writes and buffered reads for files in most file systems, it's UNSAFE with raw block devices.
Linux kernel FS code has this invalidation for direct writes, but block device pseudo-FS code doesn't.
You can compare __blockdev_direct_IO in fs/direct-io.c ("a library function for use by filesystem drivers") and __blkdev_direct_IO in block/fops.c (direct I/O handler for block device files). The first one calls kiocb_invalidate_post_direct_write (the same logic wasn't separated into a function in previous versions), the second one doesn't.
By the way, the code also suggests that mixing direct and buffered I/O isn't 1000% safe for FS too because an invalidation error isn't treated as write error - the write succeeds, the kernel only issues a warning in dmesg: "Page cache invalidation failure on direct I/O". No idea what can lead to such error, but mixing also becomes unsafe when it happens.
Upvotes: 1
Reputation: 94455
I think directIO write to file should be rather safe for later cached reads on this file, but the read may have lower performance (written data was not save in page cache and must be read from real storage). But exact code path may depend on the filesystem used.
This post https://lwn.net/Articles/776801/ mentions that direct IO has invalidation semantics:
with some filesystems at least, performing a direct-I/O read on a page will force that page out of the cache
The book lists 3 strategies for writing in "Write Caching" section: no-write, write-through, write-back. Direct I/O may be "no-write" variation of write()
syscall.
Using of several fd for single file is safe as the data is managed by FS code using inode. Both fd will point to the same inode.
In 2013 there was a thread in mailing list https://lists.kernelnewbies.org/pipermail/kernelnewbies/2013-July/008660.html and TLDR is:
From a kernel developer's perspective : The kernel driver guarantees coherency between then page-cache and data transferred using O_DIRECT. ...
- Do not worry about coherency between the page-cache and the data transferred using O_DIRECT. The kernel will invalidate the cache after an O_DIRECT write and flush the cache before an O_DIRECT read.
- Use mutexes or semaphores(or any of the numerous options [1]) to prevent the usual synchronisation problems during IPC using a shared file.
So while direct write will clear written part of file from page cache, there is some possibility of race between writer and reader. So mutex or other sync is needed if your reader wants to get updated data. Only after direct IO write() syscall ends page cache will be cleared.
Sometimes mixing is not recommended: https://medium.com/databasss/on-disk-io-part-1-flavours-of-io-8e1ace1de017 "It is discouraged to open the same file with Direct IO and Page Cache simultaneously, since direct operations will be performed against disk device even if the data is in Page Cache, which may lead to undesired results."
Upvotes: 2