Reputation: 681
I am using windows machine to get performance results for logback + slf4j
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import ch.qos.logback.classic.Level;
public class LogPerformanceAnalyser {
private static final Logger LOG =
LoggerFactory.getLogger(LogPerformanceAnalyser.class);
public LogPerformanceAnalyser() {
((ch.qos.logback.classic.Logger) LOG).setLevel(Level.ERROR);
}
public long getTimeWithCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
if (LOG.isDebugEnabled()) {
LOG.debug("This log is {} check", "with");
}
}
return System.currentTimeMillis() - startTime;
}
public long getTimeWithoutCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
LOG.debug("This log is {} check", "without");
}
return System.currentTimeMillis() - startTime;
}
}
Note - 1. debug is off 2. Running 10 times to get average result.
Performance Results are :
Total Time getTimeWithoutCheck: 26900 ms
Total Time getTimeWithCheck : 22536 ms
Result is that placing the check before logging saved ~3.5 sec for 21.7 billion logs.
If i changed Logger as non static :
private final Logger LOG =
LoggerFactory.getLogger(LogPerformanceAnalyser.class);
and I get the following:
Total Time getTimeWithoutCheck: 37095 ms
Total Time getTimeWithCheck : 47006 ms
Can anyone explain the this?
Upvotes: 3
Views: 2161
Reputation: 6036
First of all, you need correct benchmarks. In Java world JMH is the de-facto standard of benchmarking.
Benchmarks:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class LogBench {
private static final Logger LOG = LoggerFactory.getLogger(LogBench.class);
private final Logger localLog = LoggerFactory.getLogger(LogBench.class);
@Benchmark
public long baseline() {
return 0;
}
@Benchmark
public void getTimeWithCheck() {
if (LOG.isTraceEnabled()) {
LOG.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheck() {
LOG.trace("This log is {} check", "without");
}
@Benchmark
public void getTimeWithCheckBenchLocal() {
if (localLog.isTraceEnabled()) {
localLog.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheckLocal() {
localLog.trace("This log is {} check", "without");
}
}
I changed debug to trace to avoid possible effects of the cast.
And the results
Benchmark Mode Samples Score Score error Units
o.o.j.s.LogBench.baseline avgt 3 0.539 0.047 ns/op
o.o.j.s.LogBench.getTimeWithCheck avgt 3 1.030 0.083 ns/op
o.o.j.s.LogBench.getTimeWithCheckLocal avgt 3 1.637 0.571 ns/op
o.o.j.s.LogBench.getTimeWithoutCheck avgt 3 1.140 0.112 ns/op
o.o.j.s.LogBench.getTimeWithoutCheckLocal avgt 3 1.628 0.311 ns/op
You can see, conditional check is useless here, but static version is 1.6x times faster than local. Let's start exploring with the difference between getTimeWithCheckLocal and getTimeWithCheck.
Beginning of getTimeWithCheck:
[Verified Entry Point]
0x00007f06b920b440: mov DWORD PTR [rsp-0x14000],eax
0x00007f06b920b447: push rbp
0x00007f06b920b448: sub rsp,0x20 ;*synchronization entry
; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheck@-1 (line 65)
execution:
0x00007f06b920b44c: movabs r10,0x76d8bdfd0 ; {oop(a 'ch/qos/logback/classic/Logger')}
0x00007f06b920b456: mov r11d,DWORD PTR [r10+0x28] ;*getfield loggerContext
0x00007f06b920b45a: mov r9d,DWORD PTR [r12+r11*8+0x60] ;*getfield turboFilterList
0x00007f06b920b45f: mov r8d,DWORD PTR [r12+r9*8+0x10] ;*getfield array
...
[Verified Entry Point]
0x00007f1c592111e0: mov DWORD PTR [rsp-0x14000],eax
0x00007f1c592111e7: push rbp
0x00007f1c592111e8: sub rsp,0x30 ;*synchronization entry
; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@-1 (line 77)
0x00007f1c592111ec: mov r11d,DWORD PTR [rsi+0xc] ;*getfield localLog
0x00007f1c592111f0: mov r10d,DWORD PTR [r12+r11*8+0x8] ; implicit exception: dispatches to 0x00007f1c59211281
0x00007f1c592111f5: cmp r10d,0xf80154ad ; {metadata('ch/qos/logback/classic/Logger')}
0x00007f1c592111fc: jne 0x00007f1c5921123c
execution:
0x00007f1c592111fe: lea r8,[r12+r11*8] ;*invokeinterface debug
0x00007f1c59211202: mov ecx,DWORD PTR [r8+0x28] ;*getfield loggerContext
You can notice that in the second experiment JIT has to perform an additional load of the Logger field value: lea r9,[r12+r10*8]
.
Let's run benchmark again with the perfasm profiler
0.04% 0.04% │↗ 0x00007f6c25229320: mov r10d,DWORD PTR [r8+0xc] ;*getfield localLog
││ ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@1 (line 77)
││ ; - org.openjdk.jmh.samples.generated.LogBench_getTimeWithoutCheckLocal_jmhTest::getTimeWithoutCheckLocal_avgt_jmhStub@14 (line 163)
6.80% 7.29% ││ 0x00007f6c25229324: mov r11d,DWORD PTR [r12+r10*8+0x8]
││ ; implicit exception: dispatches to 0x00007f6c252294a5
0.02% ││ 0x00007f6c25229329: cmp r11d,0xf80197b1 ; {metadata('ch/qos/logback/classic/Logger')}
││ 0x00007f6c25229330: jne 0x00007f6c2522939b
││ 0x00007f6c25229332: lea r9,[r12+r10*8] ;*invokeinterface debug
││ 0x00007f6c25229336: mov ecx,DWORD PTR [r9+0x28] ;*getfield loggerContext
As you can see this additional loading isn't free. This is because a lot of ways to change final variable exists, so it is unsafe to perform this optimization for fields.
As experimental feature jvm has an special option -XX:+TrustFinalNonStaticFields
, it must be used with -XX:+UnlockExperimentalVMOptions
. If you run the benchmark with this option you'll see another results:
Benchmark Mode Cnt Score Error Units
LogBench.baseline avgt 3 2.124 ± 0.907 ns/op
LogBench.getTimeWithCheck avgt 3 0.695 ± 0.231 ns/op
LogBench.getTimeWithCheckBenchLocal avgt 3 1.608 ± 0.140 ns/op
LogBench.getTimeWithoutCheck avgt 3 0.675 ± 0.075 ns/op
LogBench.getTimeWithoutCheckLocal avgt 3 1.613 ± 0.176 ns/op
The results is strange, althrough there is no additional load of local variables now, inlining is broken, asm code contains direct calls:
0x00007f2355205d33: call 0x00007f2355046020 ; OopMap{off=120}
;*invokespecial filterAndLog_1
TrustFinalNonStaticFields
which seems to be quite unstable because it breaks CHA optimizationUpvotes: 9
Reputation: 43728
The results are more or less consistent with the assumption that a field access contributes 10000 ms to the total run time. Without the check you have this overhead once, with the check you have it twice.
It is still surprising that the difference is so much.
Upvotes: 2