Reputation: 1405
I would like to measure L1, L2 and L3 Cache hit/miss ratio of some parts of my C++ code. I am not interested to use Perf for my entire application. Can Perf be used as a library inside C++?
int main() {
...
...
start_profiling()
// The part I'm interested in
...
end_profiling()
...
...
}
I gave Intel PCM a shot, but I had two issues with it. First, it gave me some strange numbers. Second, it doesn't support L1 Cache profiling.
If it's not possible with Perf, what is the easiest way to get that information?
Upvotes: 11
Views: 3494
Reputation: 11494
Yes, there is special per-thread monitoring which allows to read perf counters from within userspace. See manual page for perf_event_open(2)
Since perf
supports only L1i, L1d, and last-level cache events, you'll need to use PERF_EVENT_RAW
mode and use numbers from manual onto your CPU.
To implement a profiling, you'll need to setup sample_interval
, poll
/select
fd or wait for SIGIO
signal, and when it happens, read sample and instruction pointer from it. You'll may latter try to resolve returned instruction pointers to a function names using a debugger like GDB.
Another option is to use SystemTap. You'll need empty implementation of start|end_profiling()
, just to enable SystemTap profiling with something like that:
global traceme, prof;
probe process("/path/to/your/executable").function("start_profiling") {
traceme = 1;
}
probe process("/path/to/your/executable").function("end_profiling") {
traceme = 0;
}
probe perf.type(4).config(/* RAW value of perf event */).sample(10000) {
prof[usymname(uaddr())] <<< 1;
}
probe end {
foreach([sym+] in prof) {
printf("%16s %d\n", sym, @count(prof[sym]));
}
}
Upvotes: 3
Reputation: 17329
Sounds like all you're trying to do is read a few perf counters, something that the PAPI library is ideal for.
The full list of supported counters is quite long, but it sounds like you're most interested in PAPI_L1_TCM
, PAPI_L1_TCA
, and their L2
and L3
counterparts. Note that you can also break down the accesses into reads/writes, and you can distinguish instruction and data caches.
Upvotes: 3