How smart is mmap?

Question

mmap can be used to share read-only memory between processes, reducing the memory foot print:

process P1 mmaps a file, uses the mapped memory -> data gets loaded into RAM
process P2 mmaps a file, uses the mapped memory -> OS re-uses the same memory

But how about this:

process P1 mmaps a file, loads it into memory, then exits.
another process P2 mmaps the same file, accesses the memory that is still hot from P1's access.

Is the data loaded again from disk? Is the OS smart enough to re-use the virtual memory even if "mmap count" dropped to zero temporarily?

Does the behaviour differ between different OS? (I'm mostly interested in Linux/OS X)

EDIT: In case the OS is not smart enough -- would it help if there is one "background process", keeping the file mmaped, so it never leaves the address space of at least one process?

I am of course interested in performance when I mmap and munmap the same file successively and rapidly, possibly (but not necessarily) within the same process.

EDIT2: I see answers describing completely irrelevant points at great length. To reiterate the point -- can I rely on Linux/OS X to not re-load data that already resides in memory, from previous page hits within mmaped memory segments, even though the particular region is no longer mmaped by any process?

Celada · Accepted Answer

The presence or absence of the contents of a file in memory is much less coupled to mmap system calls than you think. When you mmap a file, it doesn't necessarily load it into memory. When you munmap it (or if the process exits), it doesn't necessarily discard the pages.

There are many different things that could trigger the contents of a file to be loaded into memory: mapping it, reading it normally, executing it, attempting to access memory that is mapped to the file. Similarily, there are different things that could cause the file's contents to be removed from memory, mostly related to the OS deciding it wants the memory for something more important.

In the two scenarios from your question, consider inserting a step between steps 1 and 2:

1.5. another process allocates and uses a large amount of memory -> the mmaped file is evicted from memory to make room.

In this case the file's contents will probably have to get reloaded into memory if they are mapped again and used again in step 2.

versus:

1.5. nothing happens -> the contents of the mmaped file hang around in memory.

In this case the file's contents don't need to be reloaded in step 2.

In terms of what happens to the contents of your file, your two scenarios aren't much different. It's something like this step 1.5 that would make a much more important difference.

As for a background process that is constantly accessing the file in order to ensure it's kept in memory (for example, by scanning the file and then sleeping for a short amount of time in a loop), this would of course force the file to remain in memory. but you're probably better off just letting the OS make its own decision about when to evict the file and when not to evict it.

How smart is mmap?

Answers (2)

Related Questions