Nubcake
Nubcake

Reputation: 469

c - how to split double multiplication?

I'm using avr-gcc where sizeof(double) and sizeof(float) are both 4 and I'm having an issue with double arithmetic to get the correct integer result:

// x is some value between 8.0 and 9.6103
double x = 9.6103;
uint32_t r = pow(x,2) * 8813377.768984962;

The correct value of r should be 813984763 rounded down but the actual result is 813984768. How can I get the correct integer result?

I've tried to split the calculation like this:

uint32_t r1 = pow(x,2) * 8813;
double d1 = pow(x,2) * .77768984962;
uint32_t r = r1 + d1;

But this still suffers from precision issues i.e I can't seem to get 813984763 exactly and I'm only interested in that the integer part of the result is correct. Any ideas?

Upvotes: 0

Views: 131

Answers (2)

KamilCuk
KamilCuk

Reputation: 141890

You could scale it up and use 128-bit integers to do the arithmetic. 128-bit is soooo much you could just multiply it all to integers.

double x = 9.6103;
uint128_t y = x * 10000; // = 96103 / 10000
uint128_t c = 8813377768984962; // = 8813377.768984962 * 1000000000
uint32_t r = y * y * c / 10000/10000 /1000000000;
// max y * y * c = 96103 * 96103 * 8813377768984962 =
//             = 81398476378849607561973858
// UINT128_MAX = 340282366920938463463374607431768211456
// ^^ is way more, so it will not overflow.

Your platform most probably does not support __uint128_t GCC extension, so you could write your own library for that. There are endless 128-bit libraries in C++ on github - port one to C (or find one in C) and use it.


Well, I got some free time and I always wanted to have a C uint128 library, so I took this library https://github.com/calccrypto/uint128_t and ported it to C and wrote an executable that does the same computations as presented above and compiled it for atmega128 with avr-gcc -Os and run avr-nm -td --sort-size over the result. These are the biggest 5 symbols in the result and the whole program has ~12KB of .text. So, a bit of space is needed for this solution to work.

00000642 T how_to_split_double_multiplication
00000706 T kuint128_rshift
00000762 T kuint128_lshift
00003104 T kuint128_mul
00004594 T kuint128_divmod

Upvotes: 0

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215577

A float cannot represent the precision you need for this value (813984763), much less for the calculation, and as you've noted avr-gcc has wrongly redefined double to be the same as float.

The closest representable values in float are:

  • Below: 813984704
  • Above: 813984768 (closer)

Upvotes: 1

Related Questions