Quicksilver
Quicksilver

Reputation: 164

Why is the BigInteger.ModPow function in C# much slower than that in Java?

I have found that the BigInteger.ModPow function in C# is very slow compared to the BigInteger.modPow function in Java. This makes me reluctant to use C# to implement functions that perform modular exponentiation.

I have written a test program to prove it.

C#

static void Main(string[] args)
{
    BigInteger num = BigInteger.Parse("444266014606582911577255360081280172978907874637194279031281180366057");
    BigInteger m = 2;
    Console.WriteLine("Start multiply.");
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 3; i <= 200000; i++)
        m *= i;
    stopwatch.Stop();
    Console.WriteLine(stopwatch.ElapsedMilliseconds);
    stopwatch.Reset();
    Console.WriteLine("Start mod pow.");
    stopwatch.Start();
    for (int i = 0; i < 10; i++)
        BigInteger.ModPow(3, m, num);
    stopwatch.Stop();
    Console.WriteLine(stopwatch.ElapsedMilliseconds);
}

An equivalent program in Java

public static void main(String[] args) {
    BigInteger num = new BigInteger("444266014606582911577255360081280172978907874637194279031281180366057");
    BigInteger m = BigInteger.TWO;
    System.out.println("Start multiply.");
    long startTime = System.currentTimeMillis();
    for (int i = 3; i <= 200000; i++)
        m = m.multiply(BigInteger.valueOf(i));
    System.out.println(System.currentTimeMillis() - startTime);
    System.out.println("Start mod pow.");
    startTime = System.currentTimeMillis();
    for (int i = 0; i < 10; i++)
        BigInteger.valueOf(3).modPow(m, num);
    System.out.println(System.currentTimeMillis() - startTime);
}

The program consists of 2 parts:

  1. Calculate 200000! to produce a very large number m.
  2. Calculate 3 ^ m mod num 10 times.

You may change the numbers or the loop count to try finding different results.

Here is an execution result on my computer.

Specifications

C#

Start multiply.
19443
Start mod pow.
35292

Java

Start multiply.
14668
Start mod pow.
3462

It shows that the BigInteger.ModPow function in C# is about 10 times slower than that in Java. Does anyone know the reason? Is that a bug?

Upvotes: 3

Views: 1174

Answers (2)

user555045
user555045

Reputation: 64913

You can take a look at the .Net implémentation here and the java ones here.
It appears that the java ones was more heavily studied.

The bottom line is that the .Net source shows a plain vanilla binary exponentiation powermod algorithm, but Java's is quite sophisticated using sliding windows and Montgomery multiplication. The Java code also carefully "leans into" it's intrinsic functions and, in turn, some of those intrinsics are written specifically for big integer arithmetic.


As an experiment, I tried to port the Java code to C#. The goal was to pull apart how much of the performance difference comes from the code (algorithms and implementations), and how much comes from the difference in JIT compiler optimizations.

In order to have something to fairly compare against, this is how the Java version did on my PC:

Start multiply.
7473
Start mod pow.
1406

The times (I also printed a result to verify that the ported code is valid) for the C# versions (running on .NET 6.0), using BigInteger and the version ported from Java, were as follows:

Builtin:
Start multiply.
8059
Start mod pow.
15696
09F59D6D54CE55B44FDF4F4D70E81DBFC8034ECE19339BC7B922F94EA5
Ported from Java:
Start multiply.
8695
Start mod pow.
4971
00000009F59D6D54CE55B44FDF4F4D70E81DBFC8034ECE19339BC7B922F94EA5

I also replicated the experiment I suggested with doing the modpow only once, but that didn't result in anything interesting, just approximately cuts the time of the "modpow phase" in 10.

Some observations just based on the time:

  • That is approximately the same performance ratio between the Java code and the C#-with-builtin-BigInteger code as you observed: multiply a little faster in Java, modPow more than 10x faster in Java.
  • The C# and Java versions of the modpow code, ported with minor differences, so algorithmically identical, performed wildly differently.
  • The ModPow from Java, ported to C#, despite being a lot slower in C# than it was in Java, still had a significant edge over the builtin System.Numerics.BigInteger version in C#.
  • The ported version multiply is not only slower than it was in Java, it's slower than what C# has built in, but not a whole lot.

But why. Unfortunately I will make observations only about the C# version and speculation about the Java version. I did not manage to get the assembly code for the relevant functions out of the JVM. I tried various commands such as explained here and got nothing. Obviously comparing two things when I cannot even look at the second thing is less than ideal. I'd be happy to actually compare the two side by side if someone manages to extract the assembly of the Java methods.

  • I saw a lot of array bounds checks. Array bounds checks in the "default loop" (counting up from 0 to length - 1 and accessing the array directly with the loop counter) are often optimized out, but the Java code has a lot of backwards loops (due to using a big-endian limb order) and in the C# port of that code these bounds checks were not optimized out (hopefully that will be improved in future versions of .NET). It's possible, but I don't know that for sure, that array access in backwards loops are optimized by eg Oracle HotSpot, which would give it a (usually small) edge in nearly every significant function of this code (most of them have a backwards loop in them). That may be enough to explain the performance difference (between the regular C# version and the version that was ported from Java) of the "multiply phase". That still leaves the question of how the Java code is faster when running as actual Java code..

  • As expected, mulAdd is the single most time consuming function during the "modpow phase". My ported version of it looks like this:

     static int mulAdd(int[] _out, int[] _in, int offset, int len, int k)
     {
         ulong kLong = (uint)k;
         ulong carry = 0;
         offset = _out.Length - offset - 1;
         for (long j = len - 1; j >= 0; j--)
         {
             ulong product = (uint)_in[j] * kLong +
                           (uint)_out[offset] + carry;
             _out[offset--] = (int)product;
             carry = (product >> 32);
         }
         return (int)carry;
     }
    

    I think that's a reasonable port, staying close to the original while not using many "Java-isms" (using unsigned integers instead of the & LONG_MASK that Java uses), and the associated assembly code doesn't even look too bad .. not great, it has a bunch of array bounds checks and useless movsxd instructions, but should that really cost a 3x slowdown? mulAdd has a self-time of about 2.4 seconds, so even if it's the other code that is unusually slow compared to what happens in Java, that wouldn't explain the difference: the time due to just mulAdd in C# is already more than the total time that Java spent on the entire "modpow phase".

All in all this really isn't a full explanation, maybe it raises more questions than it answers, at least it's another data point.


The ported code is not included in this question because 1) it's too big, and 2) the source that I ported it from is licensed as GPLv2, which is incompatible with Stack Overflow posts. It wouldn't be a "snippet", which is the exemption that's usually used to justify such inclusions.

Upvotes: 5

Orace
Orace

Reputation: 8359

You can take a look at the .Net implémentation here and the java ones here.
It appears that the java ones was more heavily studied.

Upvotes: 3

Related Questions