Reputation: 11638
I'm using mmap
to read/write a file that I'm using in a database-like way. The file is much larger than available RAM. My use case is single-process, multi-threaded. How can I maximize the performance of accessing the mmap'd memory?
I'm assuming I should use MAP_PRIVATE
rather than MAP_SHARED
to take advantage of copy-on-write.
Is there any performance advantage to using MAP_POPULATE
and/or MAP_NONBLOCK
?
Are there any other performance-related things I should consider when using mmap?
Upvotes: 4
Views: 2985
Reputation: 136208
mmap
manipulates process' virtual address space and the PTEs in the CPU and RAM, and that is not a cheap operation.
Linus Torvalds replied a number of times on drawbacks of mmap
:
One way to minimize mmap
cost is keep mapping files (or parts of them) into the same virtual address space range, so that no PTE manipulation is necessary.
mmap
without MAP_POPULATE
reserves the virtual address space in the process, but doesn't back it with hardware memory pages, so that the thread raises a page fault hardware interrupt when accessing that page for the 1st time, and the kernel handles that interrupt by mapping the actual hardware memory page. MAP_POPULATE
allows you to avoid those page-faults, but it may take longer to return from mmap
.
MAP_LOCKED
makes sure that the page doesn't get swapped out.
You may also like to experiment with MAP_HUGETLB
and one of MAP_HUGE_2MB, MAP_HUGE_1GB
flags. If suitable for your application, huge pages minimize the number of TLB misses.
Try binding threads to the same NUMA node with numactl
to make sure that threads only access local NUMA memory. E.g. numactl --membind=0 --cpunodebind=0 <app>
.
MAP_PRIVATE
vs MAP_SHARED
only matters if you'd like to modify the mapped pages. MAP_PRIVATE
doesn't propagate your modification to the file or other processes' mappings of that file.
Upvotes: 3