Reputation: 1979
I need to efficiently add or multiply some constants to a result of type double in a loop to prevent underflow. For example, if we have int, multiplying with a power of 2 will be fast as the compiler will use bit shift. Is there a form of constants for efficient double
addition and multiplication?
Edit: It seems that not many understand my question, apologies for my sloppiness . I will add some code.
If a
is a int, this (multiplying with a power of 2) will be more efficient
int a = 1;
for(...)
for(...)
a *= somefunction() * 1024;
than when 1024 is replaced with say 1023. not sure what is the best if we want to add to a int, but that is not of my interest. I am interested in the case where a
is a double. What are the forms of constants (e.g. power of 2) that we can efficiently add and multiply to a double? The constant is arbitrary, just need to be large enough to prevent underflow.
This is probably not restricted to C and C++ only, but I do not know of a more appropriate tag.
Upvotes: 5
Views: 893
Reputation: 222938
On most modern processors, simply multiplying by a power of two (e.g., x *= 0x1p10;
to multiply by 210 or x *= 0x1p-10;
to divide by 210) will be fast and error-free (unless the result is large enough to overflow or small enough to underflow).
There are some processors with “early outs” for some floating-point operations. That is, they complete the instruction more quickly when certain bits are zero or meet other criteria. However, floating-point addition, subtraction, and multiplication commonly execute in about four CPU cycles, so they are fairly fast even without early outs. Additionally, most modern processors execute several instructions at a time, so other work proceeds while a multiplication is occurring, and they are pipelined, so, commonly, one multiplication can be started (and one finish) in each CPU cycle. (Sometimes more.)
Multiplying by powers of two has no rounding error because the significand (fraction portion of the value) does not change, so the new significand is exactly representable. (Except, multiplying by a value less than 1, bits of the significand can be pushed lower than the limit of the floating-point type, causing underflow. For the common IEEE 754 double format, this does not occur until the value is less than 0x1p-1022.)
Do not use division for scaling (or for reversing the effects of prior scaling). Instead, multiply by the inverse. (To remove a previous scaling of 0x1p57, multiply by 0x1p-57.) This is because division instructions are slow on most modern processors. E.g., 30 cycles is not unusual.
Upvotes: 4
Reputation: 11582
Floating point addition and multiplication typically take few cycles in modern processors.
Perhaps you should step back and think about what the algorithm is doing. In your example you have a double nested loop... that means "somefunction()" might get called many many times. The common representation of "double" is IEEE which uses 11 bits for the exponent and 52 bits for the mantissa (53 really because except for zero there is an implied '1'). What this means is you can represent numbers to 53 bits precision in a range from very tiny to very large numbers - the binary "floating point" can move 1024 (2^10) places to the left or right of the number "1.0" ... if "somefunction()" is called a thousand times and it always returns a number less than or equal to 0.5 you underflow (every time you multiply by 0.5 you cut your number "a" in half, meaning you move the binary floating point to the left. On x86 you can tell the processor to "flush denormals to zero" by setting a bit in a control register - there is no portable programming interface for doing this, with gcc you use
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
Telling the processor to flush denormals to zero will make your code run faster as the processor doesn't try to represent numbers that are beyond (smaller than) normals (sub-normals, or denormals). It seems you are trying to maintain precision in the face of an algorithm that is producing sub-normals (which forces loss of precision). How best to handle this depends on whether you control "somefunction()" or not. If you do have control of that function, then you might "normalize" the value it returns to something in the range
0.5 <= X <= 2.0
In other words, return values centered around 1.0 and keep track separately of the power of 2 that you need to multiply the final answer to scale it properly.
Upvotes: 2
Reputation: 64904
If you're using SSE, adding constants directly to the exponent field is a legitimate trick (in FPU code it's quite terrible) - it typically has double the throughput and a 4 times better latency (except on processors that have a float->int and/or int->float penalty). But since you're just doing this to prevent denormals, why not turn on FTZ (flush to zero) and DAZ (denormals are zero)?
Upvotes: 1
Reputation: 13189
On a gigahertz processor you may save 1 or 2 nanoseconds by optimizing this way (shift vs. arithmetic). However, the time it takes to load and store from memory is on the order of 100 nsecs, and to disk is 10 msecs. Worrying about the arithmetic operations is pointless compared to optimizing cache usage and disk activity. It will never make a difference in any real production program.
Just to prevent misunderstanding, I'm not saying the difference is small so don't worry about it, I'm saying that its zero. You can't write a simple program where the difference in ALU time is not completely overlapped with the time the CPU has stalled waiting for memory or I/O.
Upvotes: 0
Reputation: 11920
First get your double in a union and select the "range" and "exponent" parts. Then only shift the "exponent" or the "range" parts. Look for IEEE floating point standards. Dont forget the sign and last mantissa bit.
union int_add_to_double
{
double this_is_your_double_precision_float;
struct your_bit_representation_of_double
{
int range_bit:53;//you can shift this to make range effect
//i dont know which is mantissa bit. maybe it is first of range_bit. google it.
int exponent_bit:10; //exponential effect
int sign_bit:1; //take negative or positive
}dont_forget_struct_name;
}and_a_union_name;
Upvotes: 2
Reputation: 6357
You can use the standard frexp/ldexp functions that break up IEE 754 values into their components :
http://www.cplusplus.com/reference/clibrary/cmath/frexp/
http://www.cplusplus.com/reference/clibrary/cmath/ldexp/
Here is a simple sample code:
#include <cmath>
#include <iostream>
int main ()
{
double value = 5.4321;
int exponent;
double significand = frexp (value , &exponent);
double result = ldexp (significand , exponent+1);
std::cout << value << " -> " << result << "\n";
return 0;
}
Execution deals with : http://ideone.com/r3GBy
Upvotes: 0