Reputation: 11
I have a program where I read a file with fgetc() and one question asked is "does by using mmap() and unmap() can we reduce the amount of cache misses?"
To test it I wrote a dirty piece of code that given an argument on the command line, use mmap and the address returned by mmap or fgetc to read a file character by character and used valgrind --tool=cachegrind on my program to measure the number of cache misses and mmap does not reduce the number of cache misses by about but increase it
I have searched the Internet all day to find useful resources to help me understand why it does this. While I can see that by loading our file into the memory because it is loaded in a contiguous memory zone and we read from the first character to the last, why does it increase cache misses?.
I am looking for any particular resources or explanation that might help me understand what's really going on.
Thanks in advance.
Upvotes: 1
Views: 257
Reputation: 1
There are several caches. I guess you are talking about the kernel file system cache (or page cache), not about the CPU cache.
You could use the madvise(2) syscall to give hints (after mmap
, or pass MAP_POPULATE
to mmap(2)) with memory mapping, or use posix_fadvise(2) to give hints (before read
) for file I/O.
If using stdio(3) you probably want some larger buffer (e.g. 64Kbytes or more), see setvbuf(3). Notice that GNU glibc
fopen(3) may be able to mmap
with the m
extension in the mode.
See also readahead(2). And linuxatemyram.
Don't hope for miracles, the bottleneck is the hardware disk IO.
Upvotes: 3