Reputation: 149
In the question cpu cache performance. store misses vs load misses, there is no answer about where to find documents of events listed by perf list
I can't find it by man perf
and perf help list
,
I read the Event document of Intel@64 and AMD64, which the event format looks like the following
Last Level Cache References — Event select 2EH, Umask 4FH
So where is it?
Edit: To be clear, I want to look for the document of the event list by perf list
Upvotes: 2
Views: 1744
Reputation: 94465
List of predefined perf
events like branches
cycles
LLC-load-misses
is documented by the source code of perf subsystem inside Linux kernel. The list is mapped partially and to various hardware event for different CPU models and microarchitectures. It can be more useful to use ocperf.py
(and toplev.py) from andikleen's pmu-tools (if your CPU is Intel) with event names from Intel documentations (ocperf is not official, but it is written by Intel employee and uses official lists from https://download.01.org/perfmon/ https://download.01.org/perfmon/readme.txt "This package contains performance monitoring event lists for Intel processors")
For x86 and x86_64 perf
these (ancient) predefined/generic names are mapped at arch/x86/events
directory, for example for all Intel Core microarchitecures check arch/x86/events/intel/core.c
and search for microarchitecture by its code name (Core, Core2, NHM=Nehalem, WSM=Westmere, SNB=SandyBridge, IVB=IvyBridge, HSW=HaSWell, BDW=BroaDWell,SKL=SKyLake, SLM=SiLverMont and other from lists and amd). For Skylake there is structure at line 394 of intel/core.c of 4.15.8, and we see that PREFETCH counters are not mapped for all caches ("not supported")
static __initconst const u64 skl_hw_cache_event_ids
[ C(L1D ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_INST_RETIRED.ALL_LOADS */
[ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_INST_RETIRED.ALL_STORES */
[ C(RESULT_MISS) ] = 0x0,
...
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE */
[ C(RESULT_MISS) ] = 0x1b7, /* OFFCORE_RESPONSE */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE */
[ C(RESULT_MISS) ] = 0x1b7, /* OFFCORE_RESPONSE */
},
and extra structure to define additional flags/masks for events like OFFCORE_RESPONSE:
static __initconst const u64 skl_hw_cache_extra_regs
[ C(LL ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_READ|
SKL_LLC_ACCESS|SKL_ANY_SNOOP,
[ C(RESULT_MISS) ] = SKL_DEMAND_READ|
SKL_L3_MISS|SKL_ANY_SNOOP|
SKL_SUPPLIER_NONE,
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_WRITE|
SKL_LLC_ACCESS|SKL_ANY_SNOOP,
[ C(RESULT_MISS) ] = SKL_DEMAND_WRITE|
SKL_L3_MISS|SKL_ANY_SNOOP|
SKL_SUPPLIER_NONE,
[ C(NODE) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_READ|
SKL_L3_MISS_LOCAL_DRAM|SKL_SNOOP_DRAM,
[ C(RESULT_MISS) ] = SKL_DEMAND_READ|
SKL_L3_MISS_REMOTE|SKL_SNOOP_DRAM,
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = SKL_DEMAND_WRITE|
SKL_L3_MISS_LOCAL_DRAM|SKL_SNOOP_DRAM,
[ C(RESULT_MISS) ] = SKL_DEMAND_WRITE|
SKL_L3_MISS_REMOTE|SKL_SNOOP_DRAM,
Upvotes: 4