ANTHONY
ANTHONY

Reputation: 353

How to resolve "not counted" in perf?

perf stat -d ./sample.out Output is:

Performance counter stats for './sample.out':

          0.586266 task-clock (msec)         #    0.007 CPUs utilized          
                 2 context-switches          #    0.003 M/sec                  
                 1 cpu-migrations            #    0.002 M/sec                  
               116 page-faults               #    0.198 M/sec                  
          7,35,790 cycles                    #    1.255 GHz                     [81.06%]
     <not counted> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
     <not counted> instructions            
     <not counted> branches                
     <not counted> branch-misses           
   <not supported> L1-dcache-loads:HG      
     <not counted> L1-dcache-load-misses:HG
     <not counted> LLC-loads:HG            
   <not supported> LLC-load-misses:HG      

       0.088013919 seconds time elapsed

I read why will show up from . But I am getting for even basic counters like instructions, branches etc. Can anyone suggest how to make it work?

Interesting thing is:

sudo perf stat sleep 3

gives output:

Performance counter stats for 'sleep 3':

          0.598484 task-clock (msec)         #    0.000 CPUs utilized          
                 2 context-switches          #    0.003 M/sec                  
                 0 cpu-migrations            #    0.000 K/sec                  
               181 page-faults               #    0.302 M/sec                  
     <not counted> cycles                  
     <not counted> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
     <not counted> instructions            
     <not counted> branches                
     <not counted> branch-misses

sudo perf stat -C 1 sleep 3

 Performance counter stats for 'CPU(s) 1':

       3002.640578 task-clock (msec)         #    1.001 CPUs utilized           [100.00%]
               425 context-switches          #    0.142 K/sec                   [100.00%]
                 9 cpu-migrations            #    0.003 K/sec                   [100.00%]
                 5 page-faults               #    0.002 K/sec                  
       7,82,97,019 cycles                    #    0.026 GHz                     [33.32%]
       9,38,21,585 stalled-cycles-frontend   #  119.83% frontend cycles idle    [33.32%]
   <not supported> stalled-cycles-backend  
       3,09,81,643 instructions              #    0.40  insns per cycle        
                                             #    3.03  stalled cycles per insn [33.32%]
         70,15,390 branches                  #    2.336 M/sec                   [33.32%]
          6,38,644 branch-misses             #    9.10% of all branches         [33.32%]

       3.001075650 seconds time elapsed

Why is this unexpected working.??

Thank you

Upvotes: 4

Views: 5095

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 365881

sudo perf stat -C 1 sleep 3 profiles everything that happens on CPU 1, all processes and kernel code. That's why sudo is required. That's also why the task-clock is ~3002 ms.

perf stat sleep 3 (which doesn't need sudo) profiles only the sleep(1) process itself. The task-clock measured it at ~0.6 ms of CPU time.


sleep hardly does anything itself; most of the instructions that run are in the dynamic linker. As @osgx's answer points out, you're missing counts because perf doesn't have enough hardware counters on your machine, so it's multiplexing them. The counters with no counts must have been recording while sleep was sleeping, not running.

For good results, put your microbenchmark in a loop that runs at least a hundred milliseconds, preferably ~1 sec for good signal-to-noise ratio, depending on what counters you're counting.

Upvotes: -1

osgx
osgx

Reputation: 94465

The typical problem of perf stat -d for very short programs is not the statistical sampling, but multiplexing (percent in square brackets says [33%] - this counter was counted only for around 33% of running time).

You ask your PMU to monitor too many events at once, and perf is unable to map all required counters on real hardware (PMU - performance monitoring unit of the CPU) in same time. Typical PMU may have something like 4 or 7 or 8 independent counters, but the number may be divided by two if you have some SMT technology enabled (for example, HT - HyperThreading).

When you ask perf to count so many counters (you have 6 supported HW events in your perf stat output), it will divide all them into smaller groups. Groups will be changed by kernel at some points in time, when perf_events got chance to change them, for example on task-clock tick (~3 ms).

You can split your run into several with smaller sets of events - any number of SW events and 2-4 HW events per run:

perf stat -e task-clock,page-faults,cycles,stalled-cycles-frontend 
perf stat -e task-clock,page-faults,cycles,instructions            
perf stat -e task-clock,page-faults,branches,branch-misses           
perf stat -e task-clock,page-faults,L1-dcache-load-misses:HG,LLC-loads:HG       

Upvotes: 12

Related Questions