Reputation: 9269
I am trying to optimize following operation where I have a large number of unsigned short inputs which needs to be scaled down by a certain factor. Is there a way to optimize it to not use floating point operations
unsigned short val = 65523U;
val = val * 0.943;
Note
I will be running above operation on a DSP where floating point operations are costly
Upvotes: 5
Views: 2270
Reputation: 140168
you could multiply by 943
then divide by 1000
. You'd save a floating point division (but you'd do a multiplication + an euclidian division).
unsigned short val = 65523U;
val = (val*943UL)/1000;
I get: 61788
it works (even on systems where int
is 16-bit wide) as long as var*943
is within unsigned long
capacity (unsigned long long
could be used to extend the limit even further).
you could multiply by 943
then divide by 1000
. You'd save a floating point division (but you'd do a multiplication + an euclidian division).
unsigned short val = 65523U;
val = (val*943UL)/1000;
I get: 61788
it works (even on systems where int
is 16-bit wide) as long as var*943
is within unsigned long
capacity (unsigned long long
could be used to extend the limit even further).
EDIT: You could even avoid division computing the ratio times a power of 2, I chose 16:
So .943*(1<<16)
which is 61800.448
and you could do one multiplication and one shift operation (very fast). It's better to use unsigned long long
at this point because the intermediate result can get very large:
val = (val*61800UL)>>16;
to get roughly the same result: 61787
. Use 61801
and you get 61788
Upvotes: 1
Reputation: 33618
You can multiply with an integer approximation of 0.943 * 2^16, then divide by 2^16 which your compiler should transform into a right shift. Assuming 16-bit shorts and at least 32-bit ints:
val = ((unsigned)val * 61800) / 65536;
Depending on your exact requirements, you might get more accurate results by rounding to the nearest integer:
val = ((unsigned)val * 61800 + 32768) / 65536;
Any other power of two will work. On a 64-bit platform, you could use 2^48 for more precision.
Upvotes: 8
Reputation: 234705
With a platform that uses a 32 bit int
or higher, using
int val = 65523U;
val = val * 943 / 1000;
would be hard to beat. Convert the truncation to German rounding by altering the coefficients. If your system has a 16 bit int
then you could use a long
(note then that the multiplication by 943 and division by 1000 would take place in long
arithmetic) but the solution would require profiling.
Dividing by 1000
first would cause truncation issues; a larger type is required to accommodate the larger value.
Upvotes: 1
Reputation: 1099
The mult / divide thing is good. But even better is that you can avoid the divide.
unisgned short has a range 0 ... 65535.
All maths calculations in the CPU are internally handled as 32 bit numbers. But they get cast back to 16 bit after the computation. You want to avoid that if you're multiplying a short by a large number. The output will be a short, causing it to truncate the value. So I put casts in to show what's going on and to make sure there's no extra type casting going on from the compiler.
unsigned short val = 65523U;
const unsigned int mult = 65536 * 0.943; // expressed as a fraction of 2^16
unsigned short output = (unsigned short)(((unsigned int)val * mult) >> 16));
So this casts the value to 32 bit unsigned int (to guarantee control of the types), multiplies it by up to 2^16 based on the original fraction, then right-shifts it by 16 to put it back in the correct scale.
Upvotes: 4
Reputation: 213810
The simplest way is to just use a 32 bit type that can hold the result:
uint16_t val = 65523U;
val = (uint_fast32_t)val * 943 / 1000;
Or if you want more type correctness and portability, while at the same time allowing the compiler to use the best possible integer type for the task:
#include <stdint.h>
uint_fast16_t val = UINT16_C(65523);
val = (uint_fast16_t) ( (uint_fast32_t)val * (uint_fast32_t)943 / (uint_fast32_t)1000 );
Upvotes: 8