Reputation: 135
I present the output of perf report
of my samples collected with the perf -g -p
. I don't know how to implements the first column. There is a lot of Java_*
calls that take > 90% time.
In the presented result there is a lof of entries, I mean:
+ 98,78% 0,00% java libpthread-2.26.so
+ 95,77% 0,00% java libjvm.so
...
And my question is:
Why is there more such entries than 1? Every entry is a kind of stacktrace. The one-threaded process has exactly one stacktrace.
Upvotes: 2
Views: 2231
Reputation: 64895
The list you refer to here:
+ 98,78% 0,00% java libpthread-2.26.so + 95,77% 0,00% java libjvm.so
...
And my question is:
Why is there more such entries than 1? Every entry is a kind of stacktrace. The one-threaded process has exactly one stacktrace.
Is not a stack trace. It is a list of functions that have appeared in any stack trace in any sample during the run. Only when you expand one element of that list do you see a combined tree-view of stack traces that have contained that function.
I'm not sure what you mean by The one-threaded process has exactly one stacktrace. Any process will have a variety of stack traces over its lifetime, even if it has only one thread. For example, at the moment it starts, the stack trace would simply be main()
, and as soon as main()
calls a function, the stack trace changes to include that function and so on.
Now in the list of functions, the first column is showing you total overhead, including all children (i.e., including functions called by this function). Since almost all of the interesting work happens in a call chain that all share the same outermost functions, the top level is a kind of useless listing where many functions are shown with >90% of the overhead.
The second column is the "own" overhead, which means the amount of time1 actually spent in that method, not any children. This is close to zero for all the top methods, so the real work is happening in some method called by those methods, not in those methods themselves.
The tree view you have expanded at the top is really telling you what you need to know: ~94% of your time is spent inside/below radek.queue.wlQueue.writeBytes()
, and that guy is spending most of its time in String.intern()
. So the bottleneck here is all the String.intern()
calls, possibly because the string table is too small, or just because String.intern()
just generally sucks for de-duplication. My general rule here is just to use Guava's Interner<String>
unless you specifically need the property that literal strings share the same string pool as interned strings (i.e., that "foo" == new String("foo").intern()
).
1 I say "amount of time" here loosely - it's really the fraction of total samples of whatever event you've specified to perf record
- but by default that should be approximately CPU time.
Upvotes: 4