Reputation: 25950
I've been looking at the implementation of ThreadLocal
in the JDK, out of curiosity, and I found this :
/**
* Increment i modulo len.
*/
private static int nextIndex(int i, int len) {
return ((i + 1 < len) ? i + 1 : 0);
}
It looks fairly obvious that this could be implemented with a simple return (i + 1) % len
, but I think these guys know their stuff. Any idea why they did this ?
This code is highly oriented towards performance, with a custom map for holding thread-local mappings, weak references to help the GC being clever and so on, so I guess this is a matter of performance. Is modulo slow in Java ?
Upvotes: 27
Views: 9062
Reputation: 98294
%
is avoided for performance reasons in this example.
div
/rem
operations are slower even on CPU architecture level; not only in Java. For example, minimum latency of idiv
instruction on Haswell is about 10 cycles, but only 1 cycle for add
.
Let's benchmark using JMH.
import org.openjdk.jmh.annotations.*;
@State(Scope.Benchmark)
public class Modulo {
@Param("16")
int len;
int i;
@Benchmark
public int baseline() {
return i;
}
@Benchmark
public int conditional() {
return i = (i + 1 < len) ? i + 1 : 0;
}
@Benchmark
public int mask() {
return i = (i + 1) & (len - 1);
}
@Benchmark
public int mod() {
return i = (i + 1) % len;
}
}
Results:
Benchmark (len) Mode Cnt Score Error Units
Modulo.baseline 16 avgt 10 2,951 ± 0,038 ns/op
Modulo.conditional 16 avgt 10 3,517 ± 0,051 ns/op
Modulo.mask 16 avgt 10 3,765 ± 0,016 ns/op
Modulo.mod 16 avgt 10 9,125 ± 0,023 ns/op
As you can see, using %
is ~2.6x slower than a conditional expression. JIT cannot optimize this automatically in the discussed ThreadLocal
code, because the divisor (table.length
) is variable.
Upvotes: 32
Reputation: 19985
mod
is not that slow in Java. It's implemented as the byte code instructions irem
and frem
for Integers and Floats respectively. The JIT does a good job of optimizing this.
In my benchmarks (see article), irem
calls in JDK 1.8 take about 1 nanosecond. That's pretty quick. frem
calls are about 3x slower, so use integers where possible.
If you're using Natural Integers (e.g. array indexing) and a power of 2 Divisor (e.g. 8 thread locals), then you can use a bit twiddling trick to get a 20% performance gain.
Upvotes: 5