Reputation: 7989
I saw that JDK-8055494 : Add C2 x86 intrinsic for BigInteger::multiplyToLen() method was fixed for 8u40 update, but I didn't expect that it will speed up BigInteger multiplication in 2x - 3x times.
Here are results of benchmarks which calculate factorial by different formulas on both JVM updates:
8u31
[info] Benchmark (n) Mode Cnt Score Error Units
[info] JavaFactorial.recursion 1000 thrpt 5 13.994 ± 0.175 ops/ms
[info] JavaFactorial.recursion 10000 thrpt 5 0.202 ± 0.054 ops/ms
[info] JavaFactorial.recursionPar 1000 thrpt 5 12.066 ± 8.011 ops/ms
[info] JavaFactorial.recursionPar 10000 thrpt 5 0.253 ± 0.055 ops/ms
[info] JavaFactorial.split 1000 thrpt 5 18.255 ± 2.656 ops/ms
[info] JavaFactorial.split 10000 thrpt 5 0.286 ± 0.063 ops/ms
8u40
[info] Benchmark (n) Mode Cnt Score Error Units
[info] JavaFactorial.recursion 1000 thrpt 5 33.704 ± 0.445 ops/ms
[info] JavaFactorial.recursion 10000 thrpt 5 0.428 ± 0.199 ops/ms
[info] JavaFactorial.recursionPar 1000 thrpt 5 38.170 ± 0.433 ops/ms
[info] JavaFactorial.recursionPar 10000 thrpt 5 0.557 ± 0.030 ops/ms
[info] JavaFactorial.split 1000 thrpt 5 46.447 ± 11.582 ops/ms
[info] JavaFactorial.split 10000 thrpt 5 0.586 ± 0.154 ops/ms
Should I expect that this JDK improvement leads to run my code faster or is this yet another casual benchmark?
Edit
How to verify my findings to prove that such speed up was due that C2 intrinsic?
Code of tested functions:
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
public class JavaFactorial {
@Param({"10", "100", "1000", "10000"})
public int n;
private static ForkJoinPool pool = new ForkJoinPool();
@Benchmark
public BigInteger loop() {
return n > 20 ? loop(1, n) : BigInteger.valueOf(fastLoop(1, n));
}
@Benchmark
public BigInteger recursion() {
return n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n));
}
@Benchmark
public BigInteger recursionPar() {
return n > 20 ? recursePar(1, n) : BigInteger.valueOf(fastLoop(1, n));
}
@Benchmark
public BigInteger split() {
return n > 180 ? split(n) : (n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n)));
}
private long fastLoop(final int n1, int n2) {
long p = n1;
while (n2 > n1) {
p = p * n2;
n2--;
}
return p;
}
private BigInteger loop(int n1, final int n2) {
final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
long p = 1;
BigInteger r = BigInteger.ONE;
while (n1 <= n2) {
if (p <= l) {
p *= n1;
} else {
r = r.multiply(BigInteger.valueOf(p));
p = n1;
}
n1++;
}
return r.multiply(BigInteger.valueOf(p));
}
private BigInteger recursion(final int n1, final int n2) {
if (n2 - n1 < 65) {
return loop(n1, n2);
}
final int nm = (n1 + n2) >> 1;
return recursion(nm + 1, n2).multiply(recursion(n1, nm));
}
private BigInteger recursePar(final int n1, final int n2) {
if (n2 - n1 < 700) {
return recursion(n1, n2);
}
final int nm = (n1 + n2) >> 1;
RecursiveTask<BigInteger> t = new RecursiveTask<BigInteger>() {
protected BigInteger compute() {
return recursePar(nm + 1, n2);
}
};
if (ForkJoinTask.getPool() == pool) {
t.fork();
} else {
pool.execute(t);
}
return recursePar(n1, nm).multiply(t.join());
}
private BigInteger loop2(int n1, final int n2) {
final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
long p = 1;
BigInteger r = BigInteger.ONE;
while (n1 <= n2) {
if (p <= l) {
p *= n1;
} else {
r = r.multiply(BigInteger.valueOf(p));
p = n1;
}
n1 += 2;
}
return r.multiply(BigInteger.valueOf(p));
}
private BigInteger recursion2(final int n1, final int n2) {
if (n2 - n1 < 65) {
return loop2(n1, n2);
}
final int nm = ((n1 + n2) >> 1) | 1;
return recursion2(nm, n2).multiply(recursion2(n1, nm - 2));
}
private BigInteger split(int n) {
int i = 31 - Integer.numberOfLeadingZeros(n), s = -n, o = 1;
BigInteger p = BigInteger.ONE, r = BigInteger.ONE;
while (i >= 0) {
int h = n >> i;
int o1 = (h - 1) | 1;
if (o < o1) {
p = p.multiply(recursion2(o + 2, o1));
r = r.multiply(p);
}
o = o1;
s += h;
i--;
}
return r.shiftLeft(s);
}
}
Upvotes: 4
Views: 248
Reputation: 7989
After investigation of list of available JVM options I found that one, which was added to 8u40, can switch on/off required intrinsic exactly.
Below are results with -XX:-UseMultiplyToLenIntrinsic
for 8u40 that are very close to results from 8u31:
[info] JavaFactorial.recursion 1000 thrpt 5 12.521 ± 3.352 ops/ms
[info] JavaFactorial.recursion 10000 thrpt 5 0.217 ± 0.069 ops/ms
[info] JavaFactorial.recursionPar 1000 thrpt 5 14.268 ± 8.319 ops/ms
[info] JavaFactorial.recursionPar 10000 thrpt 5 0.286 ± 0.015 ops/ms
[info] JavaFactorial.split 1000 thrpt 5 18.768 ± 4.321 ops/ms
[info] JavaFactorial.split 10000 thrpt 5 0.255 ± 0.076 ops/ms
Upvotes: 2
Reputation: 8777
This benchmark certainly suggests that any code which makes heavy use of BigInteger
multiplication would benefit.
The real answer to your question is to write a benchmark test for your own code and run it under both 8u31
and 8u40
. You want your code to be faster, so benchmark your code and test it. If JRE code or someone elses code is faster that is a sign that your code might benefit but you have to try it and see.
Upvotes: 1