Is this benchmark comparing Java 8u40 and 8u31 for multiplying BigInteger values correct?

Question

I saw that JDK-8055494 : Add C2 x86 intrinsic for BigInteger::multiplyToLen() method was fixed for 8u40 update, but I didn't expect that it will speed up BigInteger multiplication in 2x - 3x times.

Here are results of benchmarks which calculate factorial by different formulas on both JVM updates:

8u31

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      13.994 ±     0.175  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.202 ±     0.054  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      12.066 ±     8.011  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.253 ±     0.055  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      18.255 ±     2.656  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.286 ±     0.063  ops/ms

8u40

[info] Benchmark                             (n)   Mode  Cnt       Score       Error   Units
[info] JavaFactorial.recursion              1000  thrpt    5      33.704 ±     0.445  ops/ms
[info] JavaFactorial.recursion             10000  thrpt    5       0.428 ±     0.199  ops/ms
[info] JavaFactorial.recursionPar           1000  thrpt    5      38.170 ±     0.433  ops/ms
[info] JavaFactorial.recursionPar          10000  thrpt    5       0.557 ±     0.030  ops/ms
[info] JavaFactorial.split                  1000  thrpt    5      46.447 ±    11.582  ops/ms
[info] JavaFactorial.split                 10000  thrpt    5       0.586 ±     0.154  ops/ms

Should I expect that this JDK improvement leads to run my code faster or is this yet another casual benchmark?

Edit

How to verify my findings to prove that such speed up was due that C2 intrinsic?

Code of tested functions:

@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(1)
public class JavaFactorial {
    @Param({"10", "100", "1000", "10000"})
    public int n;

    private static ForkJoinPool pool = new ForkJoinPool();

    @Benchmark
    public BigInteger loop() {
        return n > 20 ? loop(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursion() {
        return n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger recursionPar() {
        return n > 20 ? recursePar(1, n) : BigInteger.valueOf(fastLoop(1, n));
    }

    @Benchmark
    public BigInteger split() {
        return n > 180 ? split(n) : (n > 20 ? recursion(1, n) : BigInteger.valueOf(fastLoop(1, n)));
    }

    private long fastLoop(final int n1, int n2) {
        long p = n1;
        while (n2 > n1) {
            p = p * n2;
            n2--;
        }
        return p;
    }

    private BigInteger loop(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1++;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        return recursion(nm + 1, n2).multiply(recursion(n1, nm));
    }

    private BigInteger recursePar(final int n1, final int n2) {
        if (n2 - n1 < 700) {
            return recursion(n1, n2);
        }
        final int nm = (n1 + n2) >> 1;
        RecursiveTask t = new RecursiveTask() {
            protected BigInteger compute() {
                return recursePar(nm + 1, n2);
            }
        };
        if (ForkJoinTask.getPool() == pool) {
            t.fork();
        } else {
            pool.execute(t);
        }
        return recursePar(n1, nm).multiply(t.join());
    }

    private BigInteger loop2(int n1, final int n2) {
        final long l = Long.MAX_VALUE >> (32 - Integer.numberOfLeadingZeros(n2));
        long p = 1;
        BigInteger r = BigInteger.ONE;
        while (n1 <= n2) {
            if (p <= l) {
                p *= n1;
            } else {
                r = r.multiply(BigInteger.valueOf(p));
                p = n1;
            }
            n1 += 2;
        }
        return r.multiply(BigInteger.valueOf(p));
    }

    private BigInteger recursion2(final int n1, final int n2) {
        if (n2 - n1 < 65) {
            return loop2(n1, n2);
        }
        final int nm = ((n1 + n2) >> 1) | 1;
        return recursion2(nm, n2).multiply(recursion2(n1, nm - 2));
    }

    private BigInteger split(int n) {
        int i = 31 - Integer.numberOfLeadingZeros(n), s = -n, o = 1;
        BigInteger p = BigInteger.ONE, r = BigInteger.ONE;
        while (i >= 0) {
            int h = n >> i;
            int o1 = (h - 1) | 1;
            if (o < o1) {
                p = p.multiply(recursion2(o + 2, o1));
                r = r.multiply(p);
            }
            o = o1;
            s += h;
            i--;
        }
        return r.shiftLeft(s);
    }
}

Andriy Plokhotnyuk · Accepted Answer

After investigation of list of available JVM options I found that one, which was added to 8u40, can switch on/off required intrinsic exactly.

Below are results with -XX:-UseMultiplyToLenIntrinsic for 8u40 that are very close to results from 8u31:

    [info] JavaFactorial.recursion              1000  thrpt    5      12.521 ±      3.352  ops/ms
    [info] JavaFactorial.recursion             10000  thrpt    5       0.217 ±      0.069  ops/ms
    [info] JavaFactorial.recursionPar           1000  thrpt    5      14.268 ±      8.319  ops/ms
    [info] JavaFactorial.recursionPar          10000  thrpt    5       0.286 ±      0.015  ops/ms
    [info] JavaFactorial.split                  1000  thrpt    5      18.768 ±      4.321  ops/ms
    [info] JavaFactorial.split                 10000  thrpt    5       0.255 ±      0.076  ops/ms

Is this benchmark comparing Java 8u40 and 8u31 for multiplying BigInteger values correct?

Answers (2)

Related Questions