saad
saad

Reputation: 1245

Using linux perf tool to measure the amount of times the CPU has to acccess the main memory

As I understand, the perf tool can read hardware counters available on a processor to provide performance information. For example, I know to use L1-dcache-load-misses to measure the number of times the L1 cache does not have the requested data.

I want to find out how many times my CPU, when running my program, has to access the DRAM. Using perf list | grep dram throws up hundreds of counters of which I cannot find any information.

So, which event to use to measure the number of times DRAM has been accessed?

Upvotes: 2

Views: 1505

Answers (2)

John D McCalpin
John D McCalpin

Reputation: 2236

There are a lot of events labelled with "dram", but none of those are the one that you want.... :-(.

For Intel processors, it is not possible to count all of the DRAM traffic using the performance counters in the core.

The good news is that the performance counters in the memory controllers are accurate in all of the Intel systems that I have tested. The bad news is that they have different names for processors with the "client" uncore and those with the "server" uncore.

On a Xeon Gold or Xeon Platinum server system, the events can be found using:

perf list | grep -i cas

which returns:

  uncore_imc_0/cas_count_read/                       [Kernel PMU event]
  uncore_imc_0/cas_count_write/                      [Kernel PMU event]
  uncore_imc_1/cas_count_read/                       [Kernel PMU event]
  uncore_imc_1/cas_count_write/                      [Kernel PMU event]
  uncore_imc_2/cas_count_read/                       [Kernel PMU event]
  uncore_imc_2/cas_count_write/                      [Kernel PMU event]
  uncore_imc_3/cas_count_read/                       [Kernel PMU event]
  uncore_imc_3/cas_count_write/                      [Kernel PMU event]
  uncore_imc_4/cas_count_read/                       [Kernel PMU event]
  uncore_imc_4/cas_count_write/                      [Kernel PMU event]
  uncore_imc_5/cas_count_read/                       [Kernel PMU event]
  uncore_imc_5/cas_count_write/                      [Kernel PMU event]

The corresponding "perf stat" command is very long, but straightforward to construct.

I don't have access to any processors with the "client" uncore to test, but documents like https://www.intel.com/content/dam/www/public/us/en/documents/manuals/6th-gen-core-family-uncore-performance-monitoring-manual.pdf indicate that the memory controller performance counters are available in those products as well.

Upvotes: 4

Peter Cordes
Peter Cordes

Reputation: 364428

(This doesn't fully answer your question, hopefully someone else with more memory-profiling experience will answer. The events I mentions are present on Skylake-client; IDK about other CPUs.)

On a CPU without L4 eDRAM cache, you can count L3 misses. e.g. mem_load_retired.l3_miss for loads. (But that might count 2 loads to the same line as two separate misses, even though they both wait for the same LFB to fill, so actually just one access seen by the DRAM.)

And it won't count DRAM access driven by HW prefetch. Also, that's only counting loads, not write-backs of dirty data after stores.


The offcore_response events are super complex because they consider the possibility of multi-socket systems and snooping other sockets, local vs. remote RAM, and so on. Not sure if there's one single event with dram in its name that does what you want. Also, the offcore_response events divide up between demand_code_rd, demand_data_rd, demand_rfo (store misses), and other.

There is offcore_requests.l3_miss_demand_data_rd to count demand-load (non prefetch)

Upvotes: 1

Related Questions