pavelkolodin
pavelkolodin

Reputation: 2967

How to count cache-misses in mmap-ed memory (using eBPF)?

I would like to get timeseries

t0, misses
...
tN, misses

where tN is a timestamp (second-resolution) and misses is a number of times the kernel made disk-IO for my PID to load missing page of the mmap()-ed memory region when process did access to that memory. Ok, maybe connection between disk-IO and memory-access is harder to track, lets assume my program can not do any disk-io with another (than assessing missing mmapped memory) reason. I THINK, I need to track something called node-load-misses in perf world.

Any ideas how eBPF can be used to collect such data? What probes should I use?

Tried to use perf record for similar purpose: I dislike how much data perf records. As I recall the try was like (also I dont remember how I parsed that output.data file):

perf record -p $PID -a -F 10 -e node-loads -e node-load-misses -o output.data

I thought eBPF could give some facility to implement such thing in less overhead way.

Upvotes: 0

Views: 1115

Answers (1)

osgx
osgx

Reputation: 94235

Loading of mmaped pages which are not present in memory is not hardware event like perf's cache-misses or node-loads or node-load-misses. When your program assess not present memory address, GPFault/pagefault exception is generated by hardware and it is handled in software by Linux kernel codes. For first access to anonymous memory physical page will be allocated and mapped for this virtual address; for access of mmaped file disk I/O will be initiated. There are two kinds of page faults in linux: minor and major, and disk I/O is major page fault.

You should try to use trace-cmd or ftrace or perf trace. Support of fault tracing was planned for perf tool in 2012, and patches were proposed in https://lwn.net/Articles/602658/

There is a tracepoint for page faults from userspace code, and this command prints some events with memory address of page fault:

echo 2^123456%2 | perf trace -e 'exceptions:page_fault_user' bc

With recent perf tool (https://mirrors.edge.kernel.org/pub/linux/kernel/tools/perf/) there is perf trace record which can record both mmap syscalls and page_fault_user into perf.data and perf script will print all events and they can be counted by some awk or python script.

Some useful links on perf and tracing: http://www.brendangregg.com/perf.html http://www.brendangregg.com/ebpf.html https://github.com/iovisor/bpftrace/blob/master/INSTALL.md And some bcc tools may be used to trace disk I/O, like https://github.com/iovisor/bcc/blob/master/examples/tracing/disksnoop.py or https://github.com/brendangregg/perf-tools/blob/master/examples/iosnoop_example.txt

And for simple time-series stat you can use perf stat -I 1000 command with correct software events

perf stat -e cpu-clock,page-faults,minor-faults,major-faults -I 1000 ./program
...
#           time             counts unit events
     1.000112251             413.59 msec cpu-clock                 #    0.414 CPUs utilized          
     1.000112251              5,361      page-faults               #    0.013 M/sec                  
     1.000112251              5,301      minor-faults              #    0.013 M/sec                  
     1.000112251                 60      major-faults              #    0.145 K/sec                  
     2.000490561              16.32 msec cpu-clock                 #    0.016 CPUs utilized          
     2.000490561                  1      page-faults               #    0.005 K/sec                  
     2.000490561                  1      minor-faults              #    0.005 K/sec                  
     2.000490561                  0      major-faults              #    0.000 K/sec   

Upvotes: 2

Related Questions