Reputation: 1245
As I understand, the perf tool can read hardware counters available on a processor to provide performance information. For example, I know to use L1-dcache-load-misses
to measure the number of times the L1 cache does not have the requested data.
I want to find out how many times my CPU, when running my program, has to access the DRAM. Using perf list | grep dram
throws up hundreds of counters of which I cannot find any information.
So, which event to use to measure the number of times DRAM has been accessed?
Upvotes: 2
Views: 1505
Reputation: 2236
There are a lot of events labelled with "dram", but none of those are the one that you want.... :-(.
For Intel processors, it is not possible to count all of the DRAM traffic using the performance counters in the core.
The good news is that the performance counters in the memory controllers are accurate in all of the Intel systems that I have tested. The bad news is that they have different names for processors with the "client" uncore and those with the "server" uncore.
On a Xeon Gold or Xeon Platinum server system, the events can be found using:
perf list | grep -i cas
which returns:
uncore_imc_0/cas_count_read/ [Kernel PMU event]
uncore_imc_0/cas_count_write/ [Kernel PMU event]
uncore_imc_1/cas_count_read/ [Kernel PMU event]
uncore_imc_1/cas_count_write/ [Kernel PMU event]
uncore_imc_2/cas_count_read/ [Kernel PMU event]
uncore_imc_2/cas_count_write/ [Kernel PMU event]
uncore_imc_3/cas_count_read/ [Kernel PMU event]
uncore_imc_3/cas_count_write/ [Kernel PMU event]
uncore_imc_4/cas_count_read/ [Kernel PMU event]
uncore_imc_4/cas_count_write/ [Kernel PMU event]
uncore_imc_5/cas_count_read/ [Kernel PMU event]
uncore_imc_5/cas_count_write/ [Kernel PMU event]
The corresponding "perf stat" command is very long, but straightforward to construct.
I don't have access to any processors with the "client" uncore to test, but documents like https://www.intel.com/content/dam/www/public/us/en/documents/manuals/6th-gen-core-family-uncore-performance-monitoring-manual.pdf indicate that the memory controller performance counters are available in those products as well.
Upvotes: 4
Reputation: 364428
(This doesn't fully answer your question, hopefully someone else with more memory-profiling experience will answer. The events I mentions are present on Skylake-client; IDK about other CPUs.)
On a CPU without L4 eDRAM cache, you can count L3 misses. e.g. mem_load_retired.l3_miss
for loads. (But that might count 2 loads to the same line as two separate misses, even though they both wait for the same LFB to fill, so actually just one access seen by the DRAM.)
And it won't count DRAM access driven by HW prefetch. Also, that's only counting loads, not write-backs of dirty data after stores.
The offcore_response
events are super complex because they consider the possibility of multi-socket systems and snooping other sockets, local vs. remote RAM, and so on. Not sure if there's one single event with dram
in its name that does what you want. Also, the offcore_response
events divide up between demand_code_rd
, demand_data_rd
, demand_rfo
(store misses), and other
.
There is offcore_requests.l3_miss_demand_data_rd
to count demand-load (non prefetch)
Upvotes: 1