precision
precision

Reputation: 303

Confusing perf stat results after multiple runs

I executed a 1000x1000 matrix multiplication code consecutively 6 times along with perf stat -e cache-misses command and got the following results

Observation Cache-Misses Time elapsed(sec)
   1          48822173    7.697147087
   2          48663517    7.710045908
   3          48667119    7.701690126
   4          48867057    7.766267284
   5          48610651    7.701600681
   6          49203583    7.719180737 

As we can see here, cache-misses for observation 1 is greater than cache-misses in observation 2,3 & 5. But the elapsed time for observation 1 is lesser than observation 2, 3 & 5. On the other hand observation 4 has highest elapsed time among all these observations but cache-misses for observation 4 is lesser than observation 3 and observation 6. According to the textbook, increasing cache-misses elongate the execution time of a program. Then how we can explain this behavior? Thanks

Here is my system details:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 42
Stepping:              7
CPU MHz:               2300.000
BogoMIPS:              4589.89
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0-3

Upvotes: 1

Views: 795

Answers (1)

GHugo
GHugo

Reputation: 2654

Several tools exist to find the root-cause of your cache misses. But a lot of misses does not always mean longer execution time. It depends also on cache-miss level.

Moreover, it is recommended to do one or two observations runs without collecting statistics to warm caches (i.e., filling them with data): subsequent runs will benefit from the first one which had warmed up the cache with necessary data.

A tool like dprof can help you to find causes and performances problems due to cache-misses. Try it.

Upvotes: 3

Related Questions