Is modulo slow in Java?

Question

I've been looking at the implementation of ThreadLocal in the JDK, out of curiosity, and I found this :

/**
 * Increment i modulo len.
 */
 private static int nextIndex(int i, int len) {
     return ((i + 1 < len) ? i + 1 : 0);
 }

It looks fairly obvious that this could be implemented with a simple return (i + 1) % len, but I think these guys know their stuff. Any idea why they did this ?

This code is highly oriented towards performance, with a custom map for holding thread-local mappings, weak references to help the GC being clever and so on, so I guess this is a matter of performance. Is modulo slow in Java ?

apangin · Accepted Answer

% is avoided for performance reasons in this example.

div/rem operations are slower even on CPU architecture level; not only in Java. For example, minimum latency of idiv instruction on Haswell is about 10 cycles, but only 1 cycle for add.

Let's benchmark using JMH.

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class Modulo {
    @Param("16")
    int len;

    int i;

    @Benchmark
    public int baseline() {
        return i;
    }

    @Benchmark
    public int conditional() {
        return i = (i + 1 < len) ? i + 1 : 0;
    }

    @Benchmark
    public int mask() {
        return i = (i + 1) & (len - 1);
    }

    @Benchmark
    public int mod() {
        return i = (i + 1) % len;
    }
}

Results:

Benchmark           (len)  Mode  Cnt  Score   Error  Units
Modulo.baseline        16  avgt   10  2,951 ± 0,038  ns/op
Modulo.conditional     16  avgt   10  3,517 ± 0,051  ns/op
Modulo.mask            16  avgt   10  3,765 ± 0,016  ns/op
Modulo.mod             16  avgt   10  9,125 ± 0,023  ns/op

As you can see, using % is ~2.6x slower than a conditional expression. JIT cannot optimize this automatically in the discussed ThreadLocal code, because the divisor (table.length) is variable.

Is modulo slow in Java?

Answers (2)

Related Questions