Thomas Johnson
Thomas Johnson

Reputation: 11638

How can I maximize mmap performance?

I'm using mmap to read/write a file that I'm using in a database-like way. The file is much larger than available RAM. My use case is single-process, multi-threaded. How can I maximize the performance of accessing the mmap'd memory?

I'm assuming I should use MAP_PRIVATE rather than MAP_SHARED to take advantage of copy-on-write.

Is there any performance advantage to using MAP_POPULATE and/or MAP_NONBLOCK?

Are there any other performance-related things I should consider when using mmap?

Upvotes: 4

Views: 2985

Answers (1)

Maxim Egorushkin
Maxim Egorushkin

Reputation: 136208

mmap manipulates process' virtual address space and the PTEs in the CPU and RAM, and that is not a cheap operation.

Linus Torvalds replied a number of times on drawbacks of mmap:

One way to minimize mmap cost is keep mapping files (or parts of them) into the same virtual address space range, so that no PTE manipulation is necessary.

mmap without MAP_POPULATE reserves the virtual address space in the process, but doesn't back it with hardware memory pages, so that the thread raises a page fault hardware interrupt when accessing that page for the 1st time, and the kernel handles that interrupt by mapping the actual hardware memory page. MAP_POPULATE allows you to avoid those page-faults, but it may take longer to return from mmap.

MAP_LOCKED makes sure that the page doesn't get swapped out.

You may also like to experiment with MAP_HUGETLB and one of MAP_HUGE_2MB, MAP_HUGE_1GB flags. If suitable for your application, huge pages minimize the number of TLB misses.

Try binding threads to the same NUMA node with numactl to make sure that threads only access local NUMA memory. E.g. numactl --membind=0 --cpunodebind=0 <app>.

MAP_PRIVATE vs MAP_SHARED only matters if you'd like to modify the mapped pages. MAP_PRIVATE doesn't propagate your modification to the file or other processes' mappings of that file.

Upvotes: 3

Related Questions