Dici
Dici

Reputation: 25950

Is modulo slow in Java?

I've been looking at the implementation of ThreadLocal in the JDK, out of curiosity, and I found this :

/**
 * Increment i modulo len.
 */
 private static int nextIndex(int i, int len) {
     return ((i + 1 < len) ? i + 1 : 0);
 }

It looks fairly obvious that this could be implemented with a simple return (i + 1) % len, but I think these guys know their stuff. Any idea why they did this ?

This code is highly oriented towards performance, with a custom map for holding thread-local mappings, weak references to help the GC being clever and so on, so I guess this is a matter of performance. Is modulo slow in Java ?

Upvotes: 27

Views: 9062

Answers (2)

apangin
apangin

Reputation: 98294

% is avoided for performance reasons in this example.

div/rem operations are slower even on CPU architecture level; not only in Java. For example, minimum latency of idiv instruction on Haswell is about 10 cycles, but only 1 cycle for add.

Let's benchmark using JMH.

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class Modulo {
    @Param("16")
    int len;

    int i;

    @Benchmark
    public int baseline() {
        return i;
    }

    @Benchmark
    public int conditional() {
        return i = (i + 1 < len) ? i + 1 : 0;
    }

    @Benchmark
    public int mask() {
        return i = (i + 1) & (len - 1);
    }

    @Benchmark
    public int mod() {
        return i = (i + 1) % len;
    }
}

Results:

Benchmark           (len)  Mode  Cnt  Score   Error  Units
Modulo.baseline        16  avgt   10  2,951 ± 0,038  ns/op
Modulo.conditional     16  avgt   10  3,517 ± 0,051  ns/op
Modulo.mask            16  avgt   10  3,765 ± 0,016  ns/op
Modulo.mod             16  avgt   10  9,125 ± 0,023  ns/op

As you can see, using % is ~2.6x slower than a conditional expression. JIT cannot optimize this automatically in the discussed ThreadLocal code, because the divisor (table.length) is variable.

Upvotes: 32

Joseph Lust
Joseph Lust

Reputation: 19985

mod is not that slow in Java. It's implemented as the byte code instructions irem and frem for Integers and Floats respectively. The JIT does a good job of optimizing this.

In my benchmarks (see article), irem calls in JDK 1.8 take about 1 nanosecond. That's pretty quick. frem calls are about 3x slower, so use integers where possible.

If you're using Natural Integers (e.g. array indexing) and a power of 2 Divisor (e.g. 8 thread locals), then you can use a bit twiddling trick to get a 20% performance gain.

Upvotes: 5

Related Questions