Reputation: 181
I know there is a profiling tool called flame graph that is able to quickly identify the performance bottleneck of a binary.
Actually I don’t need to know the performance stats. Instead, I am only interested in the stack trace. The flame graph is able to visualize all the history stack trace. Flamegraph can contain much more info than a traditional GDB crash stacktrace because it remembers all history function calls.
So my question is : can flamegraph satisfy what I am looking for? I heard that flamegraph do samplings, so I am afraid it will lose function calls.
I found many flamegraph examples from this webpage: https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#C++. The flamegraph is exactly what I am looking for, because I can zoom in and out to check the function calls. The only concern here is: I heard that prof tool such as perf only do sampling to collect stacktraces by a sampling ratio, so the flamegraph might not faithfully demonstrate all function calls.
In short, what I am looking for is: flamegraph minus the performance stats but plus the 100% function call histories.
Upvotes: 2
Views: 929
Reputation: 181
I just figured out a way to achieve my goal. It's a hack by leveraging existing flamegraph function.
Step 1: instrument the code base by a patch similar to following:
+#include <iostream>
+#include <vector>
+#include <sstream>
+
+extern thread_local std::vector<std::string> thread_local_stack;
+
+struct tracer_t {
+ tracer_t(std::string method) {
+ thread_local_stack.emplace_back(std::move(method));
+ std::ostringstream oss;
+ for (int i = 0; i < thread_local_stack.size(); ++i) {
+ if (i) {
+ oss << ";";
+ }
+ oss << thread_local_stack[i];
+ }
+ printf("%s 1", oss.str().c_str());
+ }
+
+ ~tracer_t() {
+ thread_local_stack.pop_back();
+ }
+};
+
+inline std::string methodName(const std::string& prettyFunction)
+{
+ size_t colons = prettyFunction.find("::");
+ size_t begin = prettyFunction.substr(0, colons).rfind(" ") + 1;
+ size_t end = prettyFunction.rfind("(") - begin;
+
+ return prettyFunction.substr(begin,end) + "()";
+}
+
+#define __METHOD_NAME__ methodName(__PRETTY_FUNCTION__)
+
+#define LOG_CALL tracer_t _token(__METHOD_NAME__)
declare the thread local stack somewhere:
+thread_local std::vector<std::string> thread_local_stack;
add LOG_CALL
macro at the beginning of the functions you want to trace. There might be hundreds of them, I am not aware of any automatic tool, so I added it manually.
The above code works in clang compiler.
step 2: compile the code, and run a test case that you are interested in. We will get the traces. The content should look like this:
func1 1
func1;func2 1
func1;func2;func3 1
func1;func2;func4 1
func1;func2;func3 1
func2 1
...
The above format is recognizable by flamegraph tool from https://github.com/brendangregg/FlameGraph/blob/master/flamegraph.pl. I learned this format by trying this example: https://github.com/brendangregg/FlameGraph/blob/master/files.pl
step 3: run the flamegraph with "--flamechart" option because we want the X axis sorted by time and disable the "auto-merging" in flamegraph.
The difference between "flamechart" and "flamegraph" is: flamegraph is designed to study perf bottleneck, so many samples are merged (sum). For this post's purpose, we need to sort the X-axis by time and never merge.
Use the following cmd:
cat ~/d/output.txt | ./flamegraph.pl --hash --countname=bytes --flamechart > /tmp/out.svg
Then I got the flamechart that I want in out.svg.
Upvotes: 1