narengi
narengi

Reputation: 1405

Is it possible to use Linux Perf profiler inside C++ code?

I would like to measure L1, L2 and L3 Cache hit/miss ratio of some parts of my C++ code. I am not interested to use Perf for my entire application. Can Perf be used as a library inside C++?

int main() {
    ...
    ...
    start_profiling()
    // The part I'm interested in
    ...
    end_profiling()
    ...
    ...
}

I gave Intel PCM a shot, but I had two issues with it. First, it gave me some strange numbers. Second, it doesn't support L1 Cache profiling.

If it's not possible with Perf, what is the easiest way to get that information?

Upvotes: 11

Views: 3494

Answers (2)

myaut
myaut

Reputation: 11494

Yes, there is special per-thread monitoring which allows to read perf counters from within userspace. See manual page for perf_event_open(2)

Since perf supports only L1i, L1d, and last-level cache events, you'll need to use PERF_EVENT_RAW mode and use numbers from manual onto your CPU.

To implement a profiling, you'll need to setup sample_interval, poll/select fd or wait for SIGIO signal, and when it happens, read sample and instruction pointer from it. You'll may latter try to resolve returned instruction pointers to a function names using a debugger like GDB.


Another option is to use SystemTap. You'll need empty implementation of start|end_profiling(), just to enable SystemTap profiling with something like that:

global traceme, prof;

probe process("/path/to/your/executable").function("start_profiling") {
    traceme = 1;
}

probe process("/path/to/your/executable").function("end_profiling") {
    traceme = 0;
}

probe perf.type(4).config(/* RAW value of perf event */).sample(10000) {
    prof[usymname(uaddr())] <<< 1;
}

probe end {
    foreach([sym+] in prof) {
        printf("%16s %d\n", sym, @count(prof[sym]));
    }
}

Upvotes: 3

Adam
Adam

Reputation: 17329

Sounds like all you're trying to do is read a few perf counters, something that the PAPI library is ideal for.

Example.

The full list of supported counters is quite long, but it sounds like you're most interested in PAPI_L1_TCM, PAPI_L1_TCA, and their L2 and L3 counterparts. Note that you can also break down the accesses into reads/writes, and you can distinguish instruction and data caches.

Upvotes: 3

Related Questions