bkth
bkth

Reputation: 11

mmap vs fgetc to avoid cache misses

I have a program where I read a file with fgetc() and one question asked is "does by using mmap() and unmap() can we reduce the amount of cache misses?"

To test it I wrote a dirty piece of code that given an argument on the command line, use mmap and the address returned by mmap or fgetc to read a file character by character and used valgrind --tool=cachegrind on my program to measure the number of cache misses and mmap does not reduce the number of cache misses by about but increase it

I have searched the Internet all day to find useful resources to help me understand why it does this. While I can see that by loading our file into the memory because it is loaded in a contiguous memory zone and we read from the first character to the last, why does it increase cache misses?.

I am looking for any particular resources or explanation that might help me understand what's really going on.

Thanks in advance.

Upvotes: 1

Views: 257

Answers (1)

There are several caches. I guess you are talking about the kernel file system cache (or page cache), not about the CPU cache.

You could use the madvise(2) syscall to give hints (after mmap, or pass MAP_POPULATE to mmap(2)) with memory mapping, or use posix_fadvise(2) to give hints (before read) for file I/O.

If using stdio(3) you probably want some larger buffer (e.g. 64Kbytes or more), see setvbuf(3). Notice that GNU glibc fopen(3) may be able to mmap with the m extension in the mode.

See also readahead(2). And linuxatemyram.

Don't hope for miracles, the bottleneck is the hardware disk IO.

Upvotes: 3

Related Questions