Reputation: 201
When I use the perf record
on my code, I find three choices for the --call-graph
option: lbr
(last branch record), dwarf
and fp
.
What is difference between these?
Upvotes: 20
Views: 10561
Reputation: 22670
The option --call-graph
refers to the collection of call graphs / call chains, i.e. the function stack for a sample.
The default, fp
, uses frame pointers. This is very efficient but can be unreliable, particularly for optimized code. By explicitly using -fno-omit-frame-pointer
, you can ensure that this is available for your code. Nevertheless, the result for libraries may vary.
With dwarf
, perf
actually collects and stores a part of the stack memory itself and unwinds it with post-processing. This can be very resource consuming and may have limited stack depth. The default stack memory chunk is 8 kiB, but can be configured.
lbr
stands for last branch records. This is a hardware mechanism support by Intel CPUs. This will probably offer the best performance at the cost of portability. lbr
is also limited to userspace functions. (AMD Zen 4 CPUs also have a lbr
implementation, available on Linux 6.1+.)
Upvotes: 20