luzik
luzik

Reputation: 703

smart way to do calculation near int overflow

Is there any smart way to fix this issue?

uint32_t a = 16637510;
uint32_t b = 45627362;
uint32_t c = 0;
c = a * 100000 / b //overflows
c = (a * 100/b)*1000 //gives 36000

I need to get result c = 36463 or better 36464. And fast, non float operations are required. CPU is stm32f4

Update:

Accepted answer is converting 100000 to 100000ULL (64 bit) but as @PeterJ suggested (and deleted his answer) using stm32f4 FPU is faster then 64 dividing operations

Timer t;
int i;
t.start();
for(i = 1; i <= 100000; ++i) c = a * 100000ULL / b;
t.stop();
printf("64\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();
t.start();
for(i = 1; i <= 100000; ++i) c = (uint32_t)((float)a * 100000.0f / (float)b);
t.stop();
printf("float\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();

64 takes 0.086669 seconds, du is 57333
float takes 0.017779 seconds, du is 57333

Upvotes: 3

Views: 155

Answers (2)

chux
chux

Reputation: 154169

When 64-bit math is expensive, sometimes a 32-bit only approximate solution can be significantly faster. Depends on the processor/compiler.

Let us see what can be done using only 32-bit math.


b == 100000 == 0x186A0 and let us assume it is fixed - a 17 bit number.

a == 16637510 == 0x00FDDE46, yet OP says it is within +/- 1000. So it is a 24-bit number. b is a 26-bit number. With these restrictions, the final quotient will always be in the 36464 neighborhood (a 16-bit number)

We can scale the product operands a,b to use 16 or so significant bits of a and 16 or so most significant bits of b without losing much significance. Then we have a 16-bit * 16-bit product which will not overflow 32-bit math.

We can take advantage that b has only 12 significant bits leaving code to use up to 20 (32-12) most significant bits of the 24-bit a in the product.

The intermediate product is 41-bit, so we need to scale the multiplication down at least 9 bits.

#define SCALE_A 4
#define SCALE_M 5
// Insure SCALE_A + SCALE_M >= 9 to avoid overflow
// Perhaps other scales like SCALE_A 8, SCALE_M 1 will be faster.

uint32_t scale(uint32_t a, uint32_t b) {
  uint32_t product = (a >> SCALE_A)*(100000 >> SCALE_M);
  uint32_t c = product/(b >> (SCALE_A + SCALE_M));
  return c;
}

If this faster/better for OP? Maybe. Simply another approach to consider. I'll leave it for users to in-line it for performance profiling.

Upvotes: 0

Tim
Tim

Reputation: 4958

How about this?

c = a * 100000ULL / b; // gives 36463

See https://godbolt.org/g/aemCyw for the assembly that gcc generates for this operation and the original c = a * 100000 / b that overflows. Notice that __aeabi_uldivmod is used instead of __aeabi_uidiv.

Upvotes: 1

Related Questions