Reputation: 703
Is there any smart way to fix this issue?
uint32_t a = 16637510;
uint32_t b = 45627362;
uint32_t c = 0;
c = a * 100000 / b //overflows
c = (a * 100/b)*1000 //gives 36000
I need to get result c = 36463 or better 36464. And fast, non float operations are required. CPU is stm32f4
Update:
Accepted answer is converting 100000 to 100000ULL (64 bit) but as @PeterJ suggested (and deleted his answer) using stm32f4 FPU is faster then 64 dividing operations
Timer t;
int i;
t.start();
for(i = 1; i <= 100000; ++i) c = a * 100000ULL / b;
t.stop();
printf("64\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();
t.start();
for(i = 1; i <= 100000; ++i) c = (uint32_t)((float)a * 100000.0f / (float)b);
t.stop();
printf("float\ttakes %f seconds, du is %d\n", t.read(), c);
t.reset();
64 takes 0.086669 seconds, du is 57333
float takes 0.017779 seconds, du is 57333
Upvotes: 3
Views: 155
Reputation: 154169
When 64-bit math is expensive, sometimes a 32-bit only approximate solution can be significantly faster. Depends on the processor/compiler.
Let us see what can be done using only 32-bit math.
b == 100000 == 0x186A0
and let us assume it is fixed - a 17 bit number.
a == 16637510 == 0x00FDDE46
, yet OP says it is within +/- 1000. So it is a 24-bit number. b
is a 26-bit number. With these restrictions, the final quotient will always be in the 36464 neighborhood (a 16-bit number)
We can scale the product operands a,b
to use 16 or so significant bits of a
and 16 or so most significant bits of b
without losing much significance. Then we have a 16-bit * 16-bit product which will not overflow 32-bit math.
We can take advantage that b
has only 12 significant bits leaving code to use up to 20 (32-12) most significant bits of the 24-bit a
in the product.
The intermediate product is 41-bit, so we need to scale the multiplication down at least 9 bits.
#define SCALE_A 4
#define SCALE_M 5
// Insure SCALE_A + SCALE_M >= 9 to avoid overflow
// Perhaps other scales like SCALE_A 8, SCALE_M 1 will be faster.
uint32_t scale(uint32_t a, uint32_t b) {
uint32_t product = (a >> SCALE_A)*(100000 >> SCALE_M);
uint32_t c = product/(b >> (SCALE_A + SCALE_M));
return c;
}
If this faster/better for OP? Maybe. Simply another approach to consider. I'll leave it for users to in-line it for performance profiling.
Upvotes: 0
Reputation: 4958
How about this?
c = a * 100000ULL / b; // gives 36463
See https://godbolt.org/g/aemCyw for the assembly that gcc generates for this operation and the original c = a * 100000 / b
that overflows. Notice that __aeabi_uldivmod
is used instead of __aeabi_uidiv
.
Upvotes: 1