user861491
user861491

Reputation: 159

What's the difference between hardware Event and hardware cache Event in perf?

When I typed perf list command, I found there are two kinds of event: Hardware event and Hardware cache Event. What is the difference between the two ?

What is the difference between cache-misses and LLC-misses ? Does cache misses include LLC-misses ?

Does perf tools reduce the total performance when I test a program ?

Upvotes: 7

Views: 996

Answers (2)

Milind Dumbare
Milind Dumbare

Reputation: 3234

Question 2: If I look at ARM kernel code ("arch/arm/kernel/perf_event_v7.c") for perf

cache-misses means ARMV7_PERFCTR_L1_DCACHE_REFILL which means first level data cache miss So LLC probably means Low level Cache misses (L3 probably)

You can look at architecture specific kernel code what value ARMV7_PERFCTR_L1_DCACHE_REFILL has And the technical reference manual to know what exactly that value means. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388i/BEHCCEAE.html

Question 3: I believe perf reads counters from hardware registers (atleast for HW performance counters)so wont really affect the performance of your code. As it wont really put code hooks inside your code. But some papers say there is 5% performance penalty if you use perf in the code.

Upvotes: 1

Manuel Selva
Manuel Selva

Reputation: 19050

According to the man page of the perf_event_open system call (used internally by perf user level utilities):

  • hardware events: This indicates one of the "generalized" hardware events provided by the kernel
  • hardware cache events: This indicates a hardware cache event.

More over I am wondering if this has some link with what is called Non architectural and architectural events in [Intel® 64 and IA-32 Architectures Software Developer’s Manual 3B]Intel® 64 and IA-32 Architectures Software Developer’s Manual 3B2.

Regardless of the exact meaning of this categorization, cache-misses as stated here in a previous question and in the man page I mentioned above, represents the number of memory access that could not be served by any of the cache. Said differently, it means the number of cache misses in the last level cache. As a consequence I guess this is the same than LLC-misses, unfortunately I am not able to confirm that on my laptop because LLC-misses is not supported.

Regarding your last question, the overhead incurred by performance monitoring should be very low. Indeed, the overhead is mainly due to reading the counter values, and using perf stat I guess that this reading should be done only once at the end of the execution (considering that counters don't overflow)

Upvotes: 2

Related Questions