dogbane
dogbane

Reputation: 274878

Switch to BigInteger if necessary

I am reading a text file which contains numbers in the range [1, 10^100]. I am then performing a sequence of arithmetic operations on each number. I would like to use a BigInteger only if the number is out of the int/long range. One approach would be to count how many digits there are in the string and switch to BigInteger if there are too many. Otherwise I'd just use primitive arithmetic as it is faster. Is there a better way?

Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small? This way we would not have to worry about overflows.

Upvotes: 6

Views: 6062

Answers (7)

Bill K
Bill K

Reputation: 62789

Java is Fast--really really Fast. It's only 2-4x slower than c and sometimes as fast or a tad faster where most other languages are 10x (python) to 100x (ruby) slower than C/Java. (Fortran is also hella-fast, by the way)

Part of this is because it doesn't do things like switch number types for you. It could, but currently it can inline an operation like "a*5" in just a few bytes, imagine the hoops it would have to go through if a was an object. It would at least be a dynamic call to a's multiply method which would be a few hundred / thousand times slower than it was when a was simply an integer value.

Java probably could, these days, actually use JIT compiling to optimize the call better and inline it at runtime, but even then very few library calls support BigInteger/BigDecimal so there would be a LOT of native support, it would be a completely new language.

Also imagine how switching from int to BigInteger instead of long would make debugging video games crazy-hard! (Yeah, every time we move to the right side of the screen the game slows down by 50x, the code is all the same! How is this possible?!??)

Upvotes: 1

Daniel C. Sobral
Daniel C. Sobral

Reputation: 297295

Would it have been possible? Yes. But there are many problems with it.

Consider, for instance, that Java stores references to BigInteger, which is actually allocated on the heap, but store int literals. The difference can be made clear in C:

int i;
BigInt* bi;

Now, to automatically go from a literal to a reference, one would necessarily have to annotate the literal somehow. For instance, if the highest bit of the int was set, then the other bits could be used as a table lookup of some sort to retrieve the proper reference. That also means you'd get a BigInt** bi whenever it overflowed into that.

Of course, that's the bit usually used for sign, and hardware instructions pretty much depend on it. Worse still, if we do that, then the hardware won't be able to detect overflow and set the flags to indicate it. As a result, each operation would have to be accompanied by some test to see if and overflow has happened or will happen (depending on when it can be detected).

All that would add a lot of overhead to basic integer arithmetic, which would in practice negate any benefits you had to begin with. In other words, it is faster to assume BigInt than it is to try to use int and detect overflow conditions while at the same time juggling with the reference/literal problem.

So, to get any real advantage, one would have to use more space to represent ints. So instead of storing 32 bits in the stack, in the objects, or anywhere else we use them, we store 64 bits, for example, and use the additional 32 bits to control whether we want a reference or a literal. That could work, but there's an obvious problem with it -- space usage. :-) We might see more of it with 64 bits hardware, though.

Now, you might ask why not just 40 bits (32 bits + 1 byte) instead of 64? Basically, on modern hardware it is preferable to store stuff in 32 bits increments for performance reasons, so we'll be padding 40 bits to 64 bits anyway.

EDIT

Let's consider how one could go about doing this in C#. Now, I have no programming experience with C#, so I can't write the code to do it, but I expect I can give an overview.

The idea is to create a struct for it. It should look roughly like this:

public struct MixedInt
{
   private int i;
   private System.Numeric.BigInteger bi;

   public MixedInt(string s) 
   {
      bi = BigInteger.Parse(s);
      if (parsed <= int.MaxValue && parsed => int.MinValue)
      {
          i = (int32) parsed;
          bi = 0;
      }   
   }

   // Define all required operations
}

So, if the number is in the integer range we use int, otherwise we use BigInteger. The operations have to ensure transition from one to another as required/possible. From the client point of view, this is transparent. It's just one type MixedInt, and the class takes care of using whatever fits better.

Note, however, that this kind of optimization may well be part of C#'s BigInteger already, given it's implementation as a struct.

If Java had something like C#'s struct, we could do something like this in Java as well.

Upvotes: 0

ewernli
ewernli

Reputation: 38635

Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small?

This is one of the advantage of dynamic typing, but Java is statically typed and prevents this.

In a dynamically type language when two Integer which are summed together would produce an overflow, the system is free to return, say, a Long. Because dynamically typed language rely on duck typing, it's fine. The same can not happen in a statically typed language; it would break the type system.

EDIT

Given that my answer and comment was not clear, here I try to provide more details why I think that static typing is the main issue:

1) the very fact that we speak of primitive type is a static typing issue; we wouldn't care in a dynamically type language.

2) with primitive types, the result of the overflow can not be converted to another type than an int because it would not be correct w.r.t static typing

   int i = Integer.MAX_VALUE + 1; // -2147483648

3) with reference types, it's the same except that we have autoboxing. Still, the addition could not return, say, a BigInteger because it would not match the static type sytem (A BigInteger can not be casted to Integer).

  Integer j = new Integer( Integer.MAX_VALUE ) + 1; // -2147483648

4) what could be done is to subclass, say, Number and implement at type UnboundedNumeric that optimizes the representation internally (representation independence).

 UnboundedNum k = new UnboundedNum( Integer.MAX_VALUE ).add( 1 ); // 2147483648

Still, it's not really the answer to the original question.

5) with dynamic typing, something like

var d = new Integer( Integer.MAX_VALUE ) + 1; // 2147483648

would return a Long which is ok.

Upvotes: -2

tucuxi
tucuxi

Reputation: 17955

The impact of using BigDecimals when something smaller will suffice is surprisingly, err, big: Running the following code

public static class MyLong {
    private long l;
    public MyLong(long l) { this.l = l; }
    public void add(MyLong l2) { l += l2.l; }
}

public static void main(String[] args) throws Exception {
    // generate lots of random numbers
    long ls[] = new long[100000];
    BigDecimal bds[] = new BigDecimal[100000];
    MyLong mls[] = new MyLong[100000];
    Random r = new Random();
    for (int i=0; i<ls.length; i++) {
        long n = r.nextLong();
        ls[i] = n;
        bds[i] = new BigDecimal(n);
        mls[i] = new MyLong(n);
    }
    // time with longs & Bigints
    long t0 = System.currentTimeMillis();
    for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
        ls[i] += ls[i+1];
    }
    long t1 = Math.max(t0 + 1, System.currentTimeMillis());
    for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
        bds[i].add(bds[i+1]);
    }
    long t2 = System.currentTimeMillis();
    for (int j=0; j<1000; j++) for (int i=0; i<ls.length-1; i++) {
        mls[i].add(mls[i+1]);
    }
    long t3 = System.currentTimeMillis();
    // compare times
    t3 -= t2;
    t2 -= t1;
    t1 -= t0;
    DecimalFormat df = new DecimalFormat("0.00");
    System.err.println("long: " + t1 + "ms, bigd: " + t2 + "ms, x"
            + df.format(t2*1.0/t1) + " more, mylong: " + t3 + "ms, x"
            + df.format(t3*1.0/t1) + " more");
}

produces, on my system, this output:

long: 375ms, bigd: 6296ms, x16.79 more, mylong: 516ms, x1.38 more

The MyLong class is there only to look at the effects of boxing, to compare against what you would get with a custom BigOrLong class.

Upvotes: 1

gustafc
gustafc

Reputation: 28905

You could read the values into BigIntegers, and then convert them to longs if they're small enough.

private final BigInteger LONG_MAX = BigInteger.valueOf(Long.MAX_VALUE);
private static List<BigInteger> readAndProcess(BufferedReader rd) throws IOException {
    List<BigInteger> result = new ArrayList<BigInteger>();
    for (String line; (line = rd.readLine()) != null; ) {
        BigInteger bignum = new BigInteger(line);
        if (bignum.compareTo(LONG_MAX) > 0) // doesn't fit in a long
            result.add(bignumCalculation(bignum));
        else result.add(BigInteger.valueOf(primitiveCalculation(bignum.longValue())));
    }
    return result;
}
private BigInteger bignumCalculation(BigInteger value) { 
    // perform the calculation 
}
private long primitiveCalculation(long value) {
    // perform the calculation
}

(You could make the return value a List<Number> and have it a mixed collection of BigInteger and Long objects, but that wouldn't look very nice and wouldn't improve performance by a lot.)

The performance may be better if a large amount of the numbers in the file are small enough to fit in a long (depending on the complexity of calculation). There's still risk for overflow depending on what you do in primitiveCalculation, and you've now repeated the code, (at least) doubling the bug potential, so you'll have to decide if the performance gain really is worth it.

If your code is anything like my example, though, you'd probably have more to gain by parallelizing the code so the calculations and the I/O aren't performed on the same thread - you'd have to do some pretty heavy calculations for an architecture like that to be CPU-bound.

Upvotes: 1

polygenelubricants
polygenelubricants

Reputation: 384016

Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small?

Because that is a higher level programming behavior than what Java currently is. The language is not even aware of the BigInteger class and what it does (i.e. it's not in JLS). It's only aware of Integer (among other things) for boxing and unboxing purposes.

Speaking of boxing/unboxing, an int is a primitive type; BigInteger is a reference type. You can't have a variable that can hold values of both types.

Upvotes: 4

Kathy Van Stone
Kathy Van Stone

Reputation: 26291

I suspect the decision to use primitive values for integers and reals (done for performance reasons) made that option not possible. Note that Python and Ruby both do what you ask.

In this case it may be more work to handle the smaller special case than it is worth (you need some custom class to handle the two cases), and you should just use BigInteger.

Upvotes: 6

Related Questions