Reputation: 303
I executed a 1000x1000 matrix multiplication code consecutively 6 times along with perf stat -e cache-misses command and got the following results
Observation Cache-Misses Time elapsed(sec)
1 48822173 7.697147087
2 48663517 7.710045908
3 48667119 7.701690126
4 48867057 7.766267284
5 48610651 7.701600681
6 49203583 7.719180737
As we can see here, cache-misses for observation 1 is greater than cache-misses in observation 2,3 & 5. But the elapsed time for observation 1 is lesser than observation 2, 3 & 5. On the other hand observation 4 has highest elapsed time among all these observations but cache-misses for observation 4 is lesser than observation 3 and observation 6. According to the textbook, increasing cache-misses elongate the execution time of a program. Then how we can explain this behavior? Thanks
Here is my system details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Stepping: 7
CPU MHz: 2300.000
BogoMIPS: 4589.89
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
Upvotes: 1
Views: 795
Reputation: 2654
Several tools exist to find the root-cause of your cache misses. But a lot of misses does not always mean longer execution time. It depends also on cache-miss level.
Moreover, it is recommended to do one or two observations runs without collecting statistics to warm caches (i.e., filling them with data): subsequent runs will benefit from the first one which had warmed up the cache with necessary data.
A tool like dprof
can help you to find causes and performances problems due to cache-misses. Try it.
Upvotes: 3