Reputation: 2382
Is there a way to read performance counters periodically in linux?
Something like perf stat
with the ability to sample every X cycles is what I'm looking for.
Basically I would like to be able to read the instruction counter (number of instructions executed) every X amount of cpu cycles for some program.
Upvotes: 7
Views: 4597
Reputation: 1293
Good news: In the next kernel (Linux 3.9), perf stat will have an option -I msecs
to print event deltas at regular time intervals.
https://patchwork.kernel.org/patch/2004891/
$ perf stat -I 1000 -e cycles noploop 10 noploop for 10 seconds 1.000086918 2385155642 cycles # 0.000 GHz 2.000267937 2392279774 cycles # 0.000 GHz 3.000385400 2390971450 cycles # 0.000 GHz 4.000504408 2390996752 cycles # 0.000 GHz 5.000626878 2390853097 cycles # 0.000 GHz
http://man7.org/linux/man-pages/man1/perf-stat.1.html
-I msecs, --interval-print msecs
Print count deltas every N milliseconds (minimum: 10ms)
Upvotes: 8
Reputation: 4223
You can easily modify perf stat to do this.
In fact, I have a crude modification already implemented and would be glad to share this change with you..
The changes I made are mostly in the run_perf_stat function within the while(!done) loop
Just move the lines below the while(!done) {sleep(1);} to inside the loop and change the sleep to a nanosleep with the time period that you wish to sample at
That should make perf print the output on STDOUT(or STDERR)
If you wish to store these values, I suggest you create a 2-Dimensional array of type struct stats, update this with every sample and write periodically to a file
Upvotes: 0
Reputation: 12156
It seems that the perf tool in Linux works by recording an event when the counters reach a specific value, rather than sampling at regular intervals.
Command perf record -e cycles,instructions -c 10000
stores an event every 10000 cycles and every 10000 instructions. It can be run against a new command or an existing pid. It records to perf.data
in current directory.
Analyzing the data is another matter. Using perf script
gets you quite close:
ls 16040 2152149.005813: cycles: c113a068 ([kernel.kallsyms])
ls 16040 2152149.005820: cycles: c1576af0 ([kernel.kallsyms])
ls 16040 2152149.005827: cycles: c10ed6aa ([kernel.kallsyms])
ls 16040 2152149.005831: instructions: c1104b30 ([kernel.kallsyms])
ls 16040 2152149.005835: cycles: c11777c1 ([kernel.kallsyms])
ls 16040 2152149.005842: cycles: c10702a8 ([kernel.kallsyms])
...
You need to write a script that takes a bunch of lines from that output and counts the number of 'cycles' and 'instructions' events in that set. You can adjust the resolution by changing the parameter -c 10000
in the recording command.
I verified the analysis by running perf stat
and perf record
against ls /
. Stat reported 2 634 205 cycles, 1 725 255 instructions, while script output had 410 cycles events and 189 instructions events. The smaller the -c
value, the more overhead there seems to be in the cycles reading.
There is also a -F
option to perf record
, which samples at regular intervals. However, I could not find a way to retrieve the counter values when using this option.
Edit: perf stat
apparently works on pids also, and captures data until ctrl-c is pressed. It should be quite easy to modify the source so that it always captures for N seconds and then run it in a loop.
Upvotes: 5