Ravi
Ravi

Reputation: 681

Performance issue with logback with slf4j API

I am using windows machine to get performance results for logback + slf4j

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import ch.qos.logback.classic.Level;

public class LogPerformanceAnalyser {
    private static final Logger LOG =
            LoggerFactory.getLogger(LogPerformanceAnalyser.class);

    public LogPerformanceAnalyser() {
        ((ch.qos.logback.classic.Logger) LOG).setLevel(Level.ERROR);
    }

    public long getTimeWithCheck() {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < Integer.MAX_VALUE; i++) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("This log is {} check", "with");
            }
        }
        return System.currentTimeMillis() - startTime;
    }

    public long getTimeWithoutCheck() {
        long startTime = System.currentTimeMillis();
        for (int i = 0; i < Integer.MAX_VALUE; i++) {
            LOG.debug("This log is {} check", "without");
        }
        return System.currentTimeMillis() - startTime;
    }
}

Note - 1. debug is off 2. Running 10 times to get average result.

Performance Results are :

Total Time getTimeWithoutCheck: 26900 ms
Total Time getTimeWithCheck : 22536 ms

Result is that placing the check before logging saved ~3.5 sec for 21.7 billion logs.

If i changed Logger as non static :

private final Logger LOG = 
        LoggerFactory.getLogger(LogPerformanceAnalyser.class);

and I get the following:

Total Time getTimeWithoutCheck: 37095 ms
Total Time getTimeWithCheck : 47006 ms

Can anyone explain the this?

Upvotes: 3

Views: 2161

Answers (2)

Sergey Tselovalnikov
Sergey Tselovalnikov

Reputation: 6036

First of all, you need correct benchmarks. In Java world JMH is the de-facto standard of benchmarking.

Benchmarks:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class LogBench {
    private static final Logger LOG = LoggerFactory.getLogger(LogBench.class);

    private final Logger localLog = LoggerFactory.getLogger(LogBench.class);


    @Benchmark
    public long baseline() {
        return 0;
    }

    @Benchmark
    public void getTimeWithCheck() {
        if (LOG.isTraceEnabled()) {
            LOG.trace("This log is {} check", "with");
        }
    }

    @Benchmark
    public void getTimeWithoutCheck() {
        LOG.trace("This log is {} check", "without");
    }

    @Benchmark
    public void getTimeWithCheckBenchLocal() {
        if (localLog.isTraceEnabled()) {
            localLog.trace("This log is {} check", "with");
        }
    }

    @Benchmark
    public void getTimeWithoutCheckLocal() {
        localLog.trace("This log is {} check", "without");
    }
}

I changed debug to trace to avoid possible effects of the cast.

And the results

    Benchmark                                     Mode   Samples        Score  Score error    Units
    o.o.j.s.LogBench.baseline                     avgt         3        0.539        0.047    ns/op
    o.o.j.s.LogBench.getTimeWithCheck             avgt         3        1.030        0.083    ns/op
    o.o.j.s.LogBench.getTimeWithCheckLocal        avgt         3        1.637        0.571    ns/op
    o.o.j.s.LogBench.getTimeWithoutCheck          avgt         3        1.140        0.112    ns/op
    o.o.j.s.LogBench.getTimeWithoutCheckLocal     avgt         3        1.628        0.311    ns/op

You can see, conditional check is useless here, but static version is 1.6x times faster than local. Let's start exploring with the difference between getTimeWithCheckLocal and getTimeWithCheck.

Static LOG assembly

Beginning of getTimeWithCheck:
[Verified Entry Point]
  0x00007f06b920b440: mov    DWORD PTR [rsp-0x14000],eax
  0x00007f06b920b447: push   rbp
  0x00007f06b920b448: sub    rsp,0x20           ;*synchronization entry
                                            ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheck@-1 (line 65)

execution:
  0x00007f06b920b44c: movabs r10,0x76d8bdfd0    ;   {oop(a 'ch/qos/logback/classic/Logger')}
  0x00007f06b920b456: mov    r11d,DWORD PTR [r10+0x28] ;*getfield loggerContext
  0x00007f06b920b45a: mov    r9d,DWORD PTR [r12+r11*8+0x60]  ;*getfield turboFilterList
  0x00007f06b920b45f: mov    r8d,DWORD PTR [r12+r9*8+0x10] ;*getfield array
...     

Nonstatic LOG assembly

[Verified Entry Point]
  0x00007f1c592111e0: mov    DWORD PTR [rsp-0x14000],eax
  0x00007f1c592111e7: push   rbp
  0x00007f1c592111e8: sub    rsp,0x30           ;*synchronization entry
                                            ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@-1 (line 77)

  0x00007f1c592111ec: mov    r11d,DWORD PTR [rsi+0xc]  ;*getfield localLog
  0x00007f1c592111f0: mov    r10d,DWORD PTR [r12+r11*8+0x8] ; implicit exception: dispatches to 0x00007f1c59211281
  0x00007f1c592111f5: cmp    r10d,0xf80154ad    ;   {metadata('ch/qos/logback/classic/Logger')}
  0x00007f1c592111fc: jne    0x00007f1c5921123c

execution:
  0x00007f1c592111fe: lea    r8,[r12+r11*8]     ;*invokeinterface debug
  0x00007f1c59211202: mov    ecx,DWORD PTR [r8+0x28]  ;*getfield loggerContext                                                                                     

You can notice that in the second experiment JIT has to perform an additional load of the Logger field value: lea r9,[r12+r10*8].

Let's run benchmark again with the perfasm profiler

0.04%    0.04%  │↗  0x00007f6c25229320: mov    r10d,DWORD PTR [r8+0xc]  ;*getfield localLog
                ││                                                ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@1 (line 77)
                ││                                                ; - org.openjdk.jmh.samples.generated.LogBench_getTimeWithoutCheckLocal_jmhTest::getTimeWithoutCheckLocal_avgt_jmhStub@14 (line 163)
6.80%    7.29%  ││  0x00007f6c25229324: mov    r11d,DWORD PTR [r12+r10*8+0x8]
                ││                                                ; implicit exception: dispatches to 0x00007f6c252294a5
         0.02%  ││  0x00007f6c25229329: cmp    r11d,0xf80197b1    ;   {metadata(&apos;ch/qos/logback/classic/Logger&apos;)}
                ││  0x00007f6c25229330: jne    0x00007f6c2522939b
                ││  0x00007f6c25229332: lea    r9,[r12+r10*8]     ;*invokeinterface debug
                ││  0x00007f6c25229336: mov    ecx,DWORD PTR [r9+0x28]  ;*getfield loggerContext

As you can see this additional loading isn't free. This is because a lot of ways to change final variable exists, so it is unsafe to perform this optimization for fields.

As experimental feature jvm has an special option -XX:+TrustFinalNonStaticFields, it must be used with -XX:+UnlockExperimentalVMOptions. If you run the benchmark with this option you'll see another results:

Benchmark                            Mode  Cnt  Score   Error  Units
LogBench.baseline                    avgt    3  2.124 ± 0.907  ns/op
LogBench.getTimeWithCheck            avgt    3  0.695 ± 0.231  ns/op
LogBench.getTimeWithCheckBenchLocal  avgt    3  1.608 ± 0.140  ns/op
LogBench.getTimeWithoutCheck         avgt    3  0.675 ± 0.075  ns/op
LogBench.getTimeWithoutCheckLocal    avgt    3  1.613 ± 0.176  ns/op

The results is strange, althrough there is no additional load of local variables now, inlining is broken, asm code contains direct calls:

0x00007f2355205d33: call   0x00007f2355046020  ; OopMap{off=120}
                                            ;*invokespecial filterAndLog_1

Conclusions

  • JVM can't trust final fields, so it has to load it from memory every time in benchmark (but it wouldn't be a problem in 99.999999% applications)
  • JVM has experimental option TrustFinalNonStaticFields which seems to be quite unstable because it breaks CHA optimization

Upvotes: 9

Henry
Henry

Reputation: 43728

The results are more or less consistent with the assumption that a field access contributes 10000 ms to the total run time. Without the check you have this overhead once, with the check you have it twice.

It is still surprising that the difference is so much.

Upvotes: 2

Related Questions