Christophe
Christophe

Reputation: 2074

Read file without evicting from OS page cache

(This is intended primary for Linux, or ideally any POSIX system.)

I'm looking for a way of reading a large number of files (any one of which might be up to 1GB by itself) with the following characteristics, as I read the pages in:

The idea is to be able to read all of these files without polluting the disk cache or evicting the current working set.

Any guidance?

Upvotes: 4

Views: 2226

Answers (4)

sourcejedi
sourcejedi

Reputation: 3271

Using posix_fadvise you can hint the OS that it should drop certain file blocks from the cache. Together with information from mincore that tells us which blocks are currently cached we can alter applications to work without disturbing the buffer cache.

This delightful workaround for [un]implemented kernel features is described in detail:

http://insights.oetiker.ch/linux/fadvise/

[Edit] Implications of kernel read-ahead

For full read performance, you should make sure to only drop the pages you've already read. Otherwise you'll drop the pages that the kernel helpfully reads in advance :). (I think this should be detected as a readahead mis-predict, which would disable it and at least avoid lots of wasted IO. But read-ahead is seriously helpful, so you want to avoid disabling it).

Also, I bet if you test the pages just ahead of your last read then they always show as in-core. It won't tell you whether anyone else was using them or not. All it will show is that kernel read-ahead is working :).

The code in the linked rsync patch shold be fine (ignoring the "array of all the fds" hack). It tests the whole file before the first read. That's reasonable because it only requires an in-core allocation of 1 byte per 4kB file page.

Upvotes: 2

caf
caf

Reputation: 239011

The best way to do this is probably with posix_fadvise(). Applying the POSIX_FADV_NOREUSE flag to the entire file before reading it seems like the best fit; unfortunately this flag does nothing on current kernels.

Something that you could try is to read a chunk of data from the file, then immediately tell the kernel that you won't need that chunk again with the POSIX_FADV_DONTNEED flag to fadvise().

Upvotes: 2

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136208

On Linux you can experiment with O_DIRECT open() flag. man open(2):

   O_DIRECT (Since Linux 2.4.10)
          Try  to minimize cache effects of the I/O to and from this file.
          In general this will degrade performance, but it  is  useful  in
          special  situations,  such  as  when  applications  do their own
          caching.  File I/O is done directly to/from user space  buffers.
          The O_DIRECT flag on its own makes at an effort to transfer data
          synchronously, but does not give the guarantees  of  the  O_SYNC
          that  data and necessary metadata are transferred.  To guarantee
          synchronous I/O the O_SYNC must be used in addition to O_DIRECT.
          See NOTES below for further discussion.

Upvotes: 4

itisravi
itisravi

Reputation: 3561

The page cache size changes dynamically depending on the memory requested by various processes, I/O write-back etc. that are happening in the system. What you can do is tune the /proc/sys/vm/swappiness value.

Upvotes: 0

Related Questions