Romain Pereira
Romain Pereira

Reputation: 68

Binary to decimal (on huge numbers)

I am building a C library on big integer number. Basically, I'm seeking a fast algorythm to convert any integer in it binary representation to a decimal one

I saw JDK's Biginteger.toString() implementation, but it looks quite heavy to me, as it was made to convert the number to any radix (it uses a division for each digits, which should be pretty slow while dealing with thousands of digits).

So if you have any documentations / knowledge to share about it, I would be glad to read it.

EDIT: more precisions about my question:

How to convert the integer represented by the N bytes at address P (let's say in little endian to make things simpler), to a C string

Example:

Thank for your answer still

Upvotes: 2

Views: 3758

Answers (2)

user14886226
user14886226

Reputation:

I recently got the challenge to print a big mersenne prime: 2**82589933-1. On my CPU that takes ~40 minutes with apcalc and ~120 minutes with python 2.7. It's a number with 24 million digits and a bit.

Here is my own little C code for the conversion:

// print 2**82589933-1

#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <inttypes.h>
#include <string.h>

const uint32_t exponent = 82589933;
//const uint32_t exponent = 100;
//outputs 1267650600228229401496703205375
const uint32_t blocks = (exponent + 31) / 32;
const uint32_t digits = (int)(exponent * log(2.0) / log(10.0)) + 10;

uint32_t num[2][blocks];
char out[digits + 1];

// blocks : number of uint32_t in num1 and num2
// num1   : number to convert
// num2   : free space
// out    : end of output buffer
void conv(uint32_t blocks, uint32_t *num1, uint32_t *num2, char *out) {
  if (blocks == 0) return;
  const uint32_t div = 1000000000;
  uint64_t t = 0;
  for (uint32_t i = 0; i < blocks; ++i) {
    t = (t << 32) + num1[i];
    num2[i] = t / div;
    t = t % div;
  }
  for (int i = 0; i < 9; ++i) {
    *out-- = '0' + (t % 10);
    t /= 10;
  }
  if (num2[0] == 0) {
    --blocks;
    num2++;
  }
  conv(blocks, num2, num1, out);
}

int main() {
  // prepare number
  uint32_t t = exponent % 32;
  num[0][0] = (1LLU << t) - 1;
  memset(&num[0][1], 0xFF, (blocks - 1) * 4);
  // prepare output
  memset(out, '0', digits);
  out[digits] = 0;
  // convert to decimal
  conv(blocks, num[0], num[1], &out[digits - 1]);
  // output number
  char *res = out;
  while(*res == '0') ++res;
  printf("%s\n", res);
  return 0;
}

The conversion is destructive and tail recursive. In each step it divides num1 by 1_000_000_000 and stores the result in num2. The remainder is added to out. Then it calls itself with num1 and num2 switched and often shortened by one (blocks is decremented). out is filled from back to front. You have to allocate it large enough and then strip leading zeroes.

Python seems to be using a similar mechanism for converting big integers to decimal.

Want to do better?

For large number like in my case each division by 1_000_000_000 takes rather long. At a certain size a divide&conquer algorithm does better. In my case the first division would be by dividing by 10 ^ 16777216 to split the number into divident and remainder. Then convert each part separately. Now each part is still big so split again at 10 ^ 8388608. Recursively keep splitting till the numbers are small enough. Say maybe 1024 digits each. Those convert with the simple algorithm above. The right definition of "small enough" would have to be tested, 1024 is just a guess.

While the long division of two big integer numbers is expensive, much more so than a division by 1_000_000_000, the time spend there is then saved because each separate chunk requires far fewer divisions by 1_000_000_000 to convert to decimal.

And if you have split the problem into separate and independent chunks it's only a tiny step away from spreading the chunks out among multiple cores. That would really speed up the conversion another step. It looks like apcalc uses divide&conquer but not multi-threading.

Upvotes: 0

Stefan Haustein
Stefan Haustein

Reputation: 18793

The reason for the BigInteger.toString method looking heavy is doing the conversion in chunks.

A trivial algorithm would take the last digits and then divide the whole big integer by the radix until there is nothing left.

One problem with this is that a big integer division is quite expensive, so the number is subdivided into chunks that can be processed with regular integer division (opposed to BigInt division):

static String toDecimal(BigInteger bigInt) {
  BigInteger chunker = new BigInteger(1000000000);
  StringBuilder sb = new StringBuilder();
  do {
    int current = bigInt.mod(chunker).getInt(0);
    bigInt = bigInt.div(chunker);
    for (int i = 0; i < 9; i ++) {
      sb.append((char) ('0' + remainder % 10));
      current /= 10;
      if (currnet == 0 && bigInt.signum() == 0) {
        break;
      }
    }
  } while (bigInt.signum() != 0);
  return sb.reverse().toString();
}

That said, for a fixed radix, you are probably even better off with porting the "double dabble" algorithm to your needs, as suggested in the comments: https://en.wikipedia.org/wiki/Double_dabble

Upvotes: 6

Related Questions